What’s Your Story: Desney Tan

What’s Your Story: Desney Tan

MSR Podcast

In this new Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

Across his time at Microsoft, Desney Tan, Managing Director of Microsoft Research Redmond, has had the experience of shepherding research ideas into products multiple times, and much like the trajectory of research, his life journey has been far from linear. In this episode, Tan shares how he moved to the United States from Singapore as a teenager, how his self-described “brashness” as a Microsoft intern helped shift the course of his career, and how human impact has been a guiding force in his work.

photos of Desney Tan throughout his life

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

DESNEY TAN: Early in the career, I always looked at successful people and it always felt like they had a goal, and it was a very nice straight line to get there, and they did all the right things, and I don’t know anyone today that I deem to be successful that had a straight-line path and did all the right things.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC ENDS]

In this episode, I’m talking with Desney Tan, a longtime Microsoft executive whose experience with the company spans computational neuroscience, human-computer interaction, and health and the life sciences. His research contributions have impacted a wide range of Microsoft products. Desney was previously Vice President and Managing Director of Microsoft Health Futures and is now Managing Director of Microsoft Research Redmond.

Much like the trajectory of research, Desney’s life journey has been far from linear. He left Singapore to attend school in the Unites States as a teenager, then worked in autonomous navigation for NASA and in VR for Disney before landing here at Microsoft. Here’s my conversation with Desney, beginning with his childhood.


DESNEY TAN: Born and raised in Singapore. Dad was an architect. Mom did everything, um, to run the family. When I turned 13, Mom and Dad came to me and they said, “Hey, would you like to try something new?” I said sure. You know, I had no idea what they, they were thinking. Two weeks later, they sent me to the US to study. Um, looking back, sometimes I flippantly claim I was just eating too much at home and so they had to send me away. [LAUGHTER] But actually, it was, you know, I think it was prescient on their part. They sort of looked at my path. They looked at the education system. They looked at the way I learned and the way I created and the way I, I acted, and they somewhat realized, I think, very early on that the US was a great … would, would be a great place for me to sort of flourish and, and sort of experiment and explore and, and grow.

GEHRKE: And so how did it work? You just went by yourself?

TAN: So I had an aunt and an uncle in Louisiana. Spent a couple of years in high school there. Um, sort of … fun, fun side story. They looked at … the high school looked at my math curriculum in Singapore, and they said, “Oh, he’s at least a year ahead.” So they skipped me a year ahead. And then through some weird miscalculations on their part, they actually ended up skipping me nearly two years ahead.

GEHRKE: Oh, wow.

TAN: And by the time we realized, I had already integrated into school, the courses were just fine, and so I ended up skipping a lot of years.

GEHRKE: So you ended up graduating then high school what …

TAN: Pretty early. I was 15.

GEHRKE: 15 …

TAN: Graduated from high school. Got to college. Had no idea what I wanted to do. What 15-year-old does? Um, ended up in liberal arts college, so University of Notre Dame. So, so I don’t know how Mom let me do this, but, you know, I got all my acceptance letters together. I said I don’t know anything about college. I don’t know where I want to go. I don’t know what I want to do. I’m going to toss all the letters up in the air, and the one that lands on top is the school I’m going to.

GEHRKE: And that’s what you did?

TAN: Yeah, that’s exactly what I did. Um, divine intervention, let’s call it. Notre Dame landed on top. You know, switched majors a bunch of times. I started off in aerospace, did chemical engineering, civil engineering. I was on the steps of becoming a priest until they sent me away. They said, “Hey, if it’s not a mission and a calling, go away and come back later.” And ended up with a computer engineering degree. You know, I had great mentors, you know, who looked out for me. I had a couple of guardian angels out there, you know, guided me along, and that, you know, that was just a wonderful breadth of education. Went back to the military for a couple of years. Uh, served there for a couple of years. Did a bunch of growing up.

GEHRKE: That, that’s quite a change, right, from being like in college and then going back to the military.

TAN: Yeah, yeah, it was a mandatory service in Singapore, and so I went back. Had a ton of fun. Learned a bunch of stuff about the world, about myself. I claim the military is one of the few organizations in the world that takes an 18-year-old and teaches them leadership, um, and teaches them about themselves and teaches them about how to push themselves and where the boundaries are. And so fairly accidentally, I, I got to benefit from all of that. At the end of that, I realized my computer engineering degree was, you know … I realized two things. One, my computer engineering degree was a little outdated by the time I got out of the military, and two, that I didn’t love being told what to do. [LAUGHS]

GEHRKE: [LAUGHS] OK.

TAN: So I came back. Uh, did grad school. I was at Carnegie Mellon. Ended up getting hooked up with a wonderful professor, Randy Pausch.

GEHRKE: “The Last Lecture,” right?

TAN: Who gave “The Last Lecture” in his last days. You know, learned a ton from him not only about academics and scholarship, but also about life and, um, and leadership.

GEHRKE: And so he was at the intersection of graphics and HCI, right, if I remember correctly?

TAN: That’s correct, yeah.

GEHRKE: So what is your PhD in?

TAN: My PhD was actually looking at, um, distributed displays in virtual reality. So how, how the human brain absorbs information and uses the world around us to be able to, um, interact with digital data and analog data.

GEHRKE: Early on in a really important field already.

TAN: Yeah, no, it was great. Spent a couple of years with NASA in the Jet Propulsion Lab doing autonomous navigation. This was the early days of, um, you know, AI and, and planning.

GEHRKE: So those aerospace engineering classes, were they actually useful?

TAN: They, you know, all the classes I took ended up coming back to be useful in a number of ways. And actually, um, you know, the diversity of viewpoints and the diversity of perspectives is something that sat very deeply in me. So anyways, you know, spent some time at NASA. Um, spent some time at Disney with the Imagineers building virtual reality theme parks. This was the late ’90s, early 2000s. So Disney at the time had all the destination theme parks: Disneyland, Disney World, places you would fly for a week and, and spend a week at. Their goal was really to build a theme park in a box that they could drop down into the urban centers, and the only way to get a theme park into a building was digital experiences. And so this was the very early days of VR. We were using, you know, million-dollar military-grade headsets. They were, you know, 18, 19 pounds.

GEHRKE: Wow.

TAN: Disney was one of the companies—and, you know, it’s sat with me for a very long time—that designs experiences for every single person on earth, right. So these headsets had to work on your 2-year-old. They had to work on your 102-year-old. They had to work on, you know, a person who spoke English, who read, who didn’t read, who didn’t speak English. You know, tall, short, large, small, all of it. And they did a wonderful job finding the core of what it means to be human and designing compelling experiences for all of us, um, and that was a ton of fun. We ended up deploying these facilities called DisneyQuest. There was one in Chicago; one in Orlando. They just closed them down a couple of years ago because actually all the VR rights have now migrated into the theme parks themselves.

GEHRKE: And it was actually a VR experience? You would go and sit …

TAN: It was a VR experience. They dropped them down. They had basically buildings. There were, you know, floors full of classic and new-age arcade games. And then there were VR experiences that you could run around in and, um, interact with.

GEHRKE: Interesting. I’ve never, I mean, I lived in Madison for four years, but I’ve never heard of that Quest experience. It seems to be a fun way to experience Disney … by not going to any of the, the theme parks.

TAN: It was super fun. Um, yeah, we, I personally got to work on a couple of rides. There was Pirates of the Caribbean.

GEHRKE: Oh, wow.

TAN: So you put on … a family would put on headsets and kind of run around, shooting pirates and what have you. And then the Aladdin ride was I thought one of the better rides.

GEHRKE: Oh, wow, yeah …

TAN: Where you sit on a magic carpet as you can imagine.

GEHRKE: Oh yeah. That sounds fun.

TAN: It was perfectly scripted for it. Um, anyways, ended up at Microsoft largely because entertainment technology while a lot of fun and while I learned a ton was, uh, strangely unsatisfying, and there was something in me and, you know, that was seeking human impact at scale in a much deeper and much more direct way. And so I thought I’d be here for three or four years largely to learn about the tech industry and how, you know, large pieces of software were deployed before going off and doing the impact work. And I’ve now been here for nearly 20 years.

GEHRKE: And where do you start? Did you start out right away at Microsoft Research, or were you first in a product group?

TAN: My career here has been a cycle of starting in Microsoft Research, incubating, failing, trying again. Failing again. You know, at some point, screaming “Eureka!” [LAUGHTER] and then doing my tours of duty through the product groups, commercializing … productizing, commercializing, you know, seeing it to at least robustness and sustainability if not impact and then coming back and doing it again. Um, and the thing that’s kept me here for so long is every time I’ve completed one of those cycles and thought I was done here, um, the company or the world in some cases would throw, you know, a bigger, thornier, juicier thing in front of me, and Microsoft has always been extremely encouraging, um, and supportive of, you know, taking on those challenges and really innovating and opening up all new, whole new opportunities.

GEHRKE: I mean this whole cycle that you’re talking about, right, of sort of starting out small at MSR (Microsoft Research), you know, having sort of the seed of an, of an idea and then growing it to a bigger project and at some point in time transitioning, transitioning it into, into the product group and actually really making it a business. So tell me about … you said you have done this, you know, a few times and, you know, once you were even highly successful. I’d love to learn more about this because I think it’s so inspiring for everybody to learn more about this.

TAN: Yeah. No, it’s been magical. I have to say before going into any of these stories that none of these paths were architected. As, as you well know, they never are. So actually my, my first experience was as an intern here, and, you know, I was a sort of brash, perhaps rash, intern. I was working on virtual reality, and in the evenings, I would meet with folks around the company to learn more, and I met with a team that was building out multi-monitor functionality in Windows NT. Prior to Windows NT, Windows computers had one and only one monitor, and they started to build the functionality to build multiple. As the brash grad student, you know, I, I had different thoughts about how this should be implemented and, you know, couldn’t convince anyone of it. And so in the evenings, I ended up starting just to build it. At the end of the internship, in addition to all the stuff I was doing, I said, “Hey, by the way, I’ve built this thing. You know, take it or leave it. Here you go.” And it ended up being the thing that was implemented in NT for a variety of reasons. That really got me hooked. Prior to that, I had imagined myself an academic, going back and, you know, being a professor somewhere in academia. And as soon as I saw, you know, the thing I did and that, you know, Microsoft actually polished up and made good in the real world…

GEHRKE: And shipping in millions and millions of desktops, right?

TAN: That’s right. There was no getting away from that.

GEHRKE: OK, right.

TAN: When I first got here, MSR had actually hired me thinking I’d work on virtual reality. And I got here and I said, hey, VR … I’ve just done a ton of VR. VR is probably 15 or 20 years out from being democratized and consumerized. I’m going to do something for a couple of years, and then I’ll come back to this. Um, so I got into computational neuroscience, looking at, um, sensors that scanned or sensed the brain and trying to figure out mental state of people. I had the imagination that this would be useful both for direct interaction but also for understanding human behavior and human actions a little bit better. We won’t go into that work, but, um, what happened with the productization of that was I went … this was at the time when Bill Gates was actually pushing very hard on tablet PCs and the stylus and the pen as an interesting input modality. The realization we had was, hey, we’ve got spatial temporal signal coming off the brain we’re trying to make sense of; the tablet guys had spatial temporal signal coming off a pen they were trying to make sense of in handwriting recognition. And so we went over and we said, hey, what interesting technological assets do you have that we can steal and use on the brain. Turns out they were more convincing than us. And, and so they said, hey, actually you’re right. The problems do look similar. What do you have that you could bring over? And so if you look at the handwriting recognition system even that stands today, it’s a big mess of a neural network, um, largely because that came out of interpreting neural signal that got transferred into the handwriting recognizer.

GEHRKE: I see.

TAN: And so I ended up spending two, maybe 2 1/2 years, working not only on the core recognition engine itself but also the entire interface that ran around the tablet PC and, you know, the tablet input panel.

GEHRKE: But that’s sort of an interesting realization, right. You came because you thought you would land Technology X for Application Y, but actually you land it for a very different application.

TAN: That’s right. And, and each cycle has had a little bit of that surprise and that serendipity, which we’ve now built into the way we do research. And, um, you sort of head down a path because it moves you forward as quickly as possible. But you keep your eyes peeled for the serendipitous detours and the, the discovery that comes out of that. Um, and I think that’s what makes Microsoft Research as an organization, um, so compelling and, and so productive, right, as … we, we do run very fast, but we have the freedoms and, you know, the flexibility really to take these windy paths and to take these detours and, and to go flip over, you know, rocks, some of which end up being, you know, dead ends.

GEHRKE: Right.

TAN: Others of which end up being extremely productive.

GEHRKE: Right. And so if you think about, let’s say, a junior person in the lab, right. They’re sort of looking at you and your career and saying, “Wow, what steps should I take to, you know, become as successful as Desney?” What, what advice would you give them, right? Because it seems like you have always had sort of MSR as sort of your rock, right. But then you jumped over the river a few times, but then came back and jumped over again. Came back.

TAN: First off, I, I don’t know that Desney has been so successful so much as, you know, the people around Desney have been extremely successful and Desney’s gotten to ride the wave. But, yeah, no, I mean every, everyone’s … you know, as I look around the table and the board, you know, everyone has a slightly different journey, and everyone has slightly different work styles and mindsets and personalities and risk tolerance and what have you. Um, so the first thing really is, is not to try to fully emulate anyone else. I always claim we’re, we’re kind of like machine learning models, right. We, we should be taking input data, positive and negative, and building our models of ourselves and our models of the world and really operating within that context. I think having a North Star, whether it’s implicit or explicit, has been extremely useful for myself and the people around me.

GEHRKE: By North Star, you mean like a philosophical North Star or technical North Star or North Star in what you want to be? What, what do you mean?

TAN: Yes, yes, yes. All of it.

GEHRKE: So tell me more about your personal North Star.

TAN: For, for, for us … for myself, it’s really been about human impact, right. Everything we do is centered on human impact. We do research because it’s part … it’s, it’s one of the steps towards achieving human impact. We productize because it’s one of the steps towards human impact. Our jobs are not ever done until we hit the point of human impact, and then they’re not quite done because there’s always more to be had. Um, so I think having that, you know, perhaps a value system, um, at least, you know, sort of grounds you really nicely and, and creates, I think, or can create a courage and a bravery to pursue, which I think is important. You know, different people do this differently, but I have been very lucky in my career to be surrounded by people that have been way, way, way better than myself, um, and, and extremely generous of their passions and their skills and their expertise and their time. You know, ask it and just about any successful person by whatever definition and I think they’ll tell you the same thing, that it’s the people around. And then being tolerant, maybe even seeking of, this windy path. You know, when I was early in the career, I always looked at successful people, or people I deemed to be successful, and it always felt like they had a goal, and it was a very nice straight line to get there, and they did all the right things and, and took all the right steps, and, um, and I don’t know anyone today that I deem to be successful that had a straight-line path and did all the right things.

GEHRKE: Yeah, and it’s often these setbacks in, you know, one’s career that actually give you often some of the best learnings because either of some things that you’ve sort of done structurally wrong or some things that, you know, you really need more experience and, and, you know, that setback gave you that experience. So, so one other question around this is also just around change, right. Because especially right now, we’re living in this time where maybe the rate of change especially in AI is kind of unprecedented. I mean, benchmarks are falling in like a quarter of the time than they would have thought to be lasting. You know, we all have played with ChatGPT. Just extrapolate that out a few more months, if not years, right. OpenAI is here talking about AGI. So how do you think about change for yourself and evolution and learning, and do you have any, any routines? How, how do you keep up with everything that’s going on?

TAN: Yeah, it’s, uh … good question. I guess the overarching philosophy, the approach that I’ve taken with my career, is that everything’s constantly in change. You know, the rate of change may vary, and the type of change and the, the mode of change might vary, but everything’s constantly changing, and so our jobs at any given point are to understand the context in the world, in the organization, with the people around you, and really be doing the best that you can at any given moment. And as that context changes, you kind of have to dynamically morph with it. I subscribe pretty fully to the Lean Startup model. So, you know, formulate hypotheses … and this is the research process really, right. Formulate hypotheses, test them as quickly as you can, learn from that, and then do it again, and rinse and repeat. And then … and, you know, you could sort of plot your path and steer your path through based on that. Um, and so we operate very much on that. As, as the world changes, we change. As, you know, the org changes, we change. And there’s a certain robustness that comes along with that. It’s not all roses, and obviously change is and uncertainty is, is a difficult context to operate in.

GEHRKE: And super interesting because it also speaks to some of the things that one should, um, sort of look out for when doing research, right. If you’re saying, well, I have these hypotheses and I want to quickly test them, right, if I’m in a field or if I work with data that I, you know, cannot really use, where the testing of an hypothesis will take months if not years to bring out, this might not be the best research direction. So how should I think about sort of research, the choice of research problems …

TAN: It’s a good question, yeah.

GEHRKE: … sort of with this, with this change in mind, right?

TAN: Yeah, yeah. Um, I don’t know. I, I’m, I guess … again, I’m brash on this. There are, there are very few problems and spaces that can’t be navigated, um, and so things that seem impossible at first glance are often navigable, you know, with a little bit or maybe sometimes a lot of creativity. Um, you know, if our jobs are to take Microsoft and the rest of the world to places that Microsoft and the rest of the world might not get itself to—hopefully positive places—then we’re going to have to do things in a way that is probably unnatural for Microsoft and the rest of the world, um, to get there. And the company and the organization, MSR, has been extremely supportive of that level of creativity.

GEHRKE: Can you give an example of that for …?

TAN: We had Cortana, which is our speech recognition and conversational engine. We didn’t really have a platform to deploy that on. At the same time, we saw a bunch of physicians, clinicians, struggling with burnout because they were seeing patients for less than half the time. They were spending more of their time sitting in front of the computer, documenting stuff, than they were seeing patients and treating patients. We said, hey, what if you put the two together? What if you sat in the room, listened to the doctor and the patient, and started to automatically generate the documentation? And in fact, if you did that, you could structure the data, which leads for better downstream analytics. Um, and if you did that, you could start to put machine learning and AI and smarts into the system, as well. That project, which was called EmpowerMD, led eventually—after a bunch of missteps and a bunch of learnings and a bunch of creativity—to a very deep partnership with Nuance, um, and creation of Dragon Ambient eXperience and the eventual acquisition thereof of that company. And, um, it’s just a wonderful product line. It’s, you know, kind of a neat way to think about data and intelligence and human augmentation and integration into otherwise messy, noisy human processes. Um, but yeah, you know, I think with enough creativity, um, you know, we’ve, we’ve bumped into very, very few brick walls.

GEHRKE: And what I love about the story is that it’s not about a specific technology choice, but it’s more about a really important problem, right.

TAN: That’s right. Yeah. If your problem is right and if your conviction is right about the value of the solution …

GEHRKE: Yeah.

TAN: …you build teams around it. You build processes around it. You’re creative in the way you execute. And, um, I’d say more times than not, we end up getting there.

GEHRKE: Yeah, well, I love that insight because it’s often much more valuable to solve an important problem than to land some deep technology on a problem that very few people care about …

TAN: I think that’s right.

GEHRKE: …and it seems like that’s what you have done here.

TAN: Yeah.

GEHRKE: Well, it was really great and inspiring to hear from you, Desney. Thanks so much for the conversation.

[OUTRO MUSIC]

TAN: Yeah, thanks for having me, Johannes.

GEHRKE: To learn more about Desney’s work or to see photos of Desney during his winding journey to Microsoft, visit aka.ms/ResearcherStories (opens in new tab).

The post What’s Your Story: Desney Tan appeared first on Microsoft Research.

Read More

Into the Omniverse: OpenUSD Enhancements for Autodesk Maya Make 3D Workflows a Ferret-Tale

Into the Omniverse: OpenUSD Enhancements for Autodesk Maya Make 3D Workflows a Ferret-Tale

Editor’s note: This post is part of Into the Omniverse, a series focused on how artists, developers and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

In 3D art and design, efficient workflows are essential for quickly bringing creative visions to life.

Universal Scene Description, aka OpenUSD, is a framework that enhances these workflows by providing a unified, extensible ecosystem for describing, composing, simulating and collaborating within 3D worlds. OpenUSD is a key technology in Autodesk’s suite of products and solutions, across media and entertainment; architecture, engineering and construction; and product design and manufacturing.

Unveiled at the AU 2023 conference this week, the latest OpenUSD updates to Autodesk Maya enable artists and technical professionals to create and manipulate OpenUSD assets with greater control and efficiency, while also ensuring more efficient and accurate 3D workflows.

Bridging the Digital and Real Worlds With Maya and OpenUSD

Many creators are using Maya and OpenUSD to propel their 3D workflows.

Karol Osinski is a 3D artist at S20M, an architectural and design firm that specializes in tackling unique, bold and elegant projects. When it comes to creating architectural visualizations, Osinski says the biggest challenge is matching the digital world to the real one.

Using USD and creative tools such as Maya, SideFX Houdini and Epic Games’ Unreal Engine, Osinski creates high-quality visuals for clients while accelerating his architectural workflows.

Osinski’s panoramic view from the 20th floor terrace in the Upper East Side

“OpenUSD provides the possibility of bridging different tools like never before,” said Osinski. “I love how accessible USD is for first-time users and how it opens opportunities to make designs very complex.”

“Sir Wade” Neistadt, an animator and YouTube creator, aims to make animation and 3D education more accessible through his video tutorials and industry training. The first step of his unique animation workflow is to act out his animations on camera. He then translates them in Maya to begin his animation work before using USD to export them to other 3D software, including Blender, for finishing touches.

The making of Sir Wade’s VFX robot animation

3D artists at NVIDIA are also experiencing the power of Maya and OpenUSD. Technical specialist Lee Fraser led the “Ferret-Tale Project” to showcase character creation and animation workflows enabled by OpenUSD and generative AI.

To create the demo, Fraser and his team collaborated across 3D applications like Blender, Autodesk Maya and Reallusion Character Creator through OpenUSD Connectors. This allowed them to reduce the data prep and import and export time that’s usually required when working with multiple data sources.

“My favorite thing about using OpenUSD is not having to think about where the 3D files I use originated from,” Fraser said. “It was also easy to use USD layers to experiment with applying different animation clips with different characters.”

Members of the creative community joined a recent livestream to share their workflows using Autodesk tools, OpenUSD and NVIDIA Omniverse, a development platform for connecting and building OpenUSD-based tools and applications.

Whether adjusting lighting conditions in an environment or looking at building designs from the street view, designers in architecture, engineering, construction and operations are advancing their work with AI. Learn more by watching the replay:

Shaping the Future of 3D With More Efficient Workflows

AU 2023 attendees experienced how Autodesk is enhancing Maya with its new OpenUSD plug-in to provide additional practical workflows for various production processes. The software’s latest features include:

  • Simplified asset sharing: Designers can now use relative paths when creating OpenUSD stages, allowing for easy asset sharing between different systems. This includes support for sublayers, references, payloads and textures.
  • Enhanced control: Plug-in developers and technical directors can overwrite the default prim writers in Maya USD to gain complete control over their OpenUSD exports.

Plus, Autodesk introduced impressive capabilities to LookdevX in Maya, a look-development tool that lets users create OpenUSD shade graphs and custom materials in Maya. These new features include:

  • Streamlined shader creation: Users can employ a unified shader workflow, replacing the need for multiple shaders. They can select their desired shader type within the parameters panel, with intuitive error messages guiding them to the correct selection.
  • Efficient operations: Creators can copy, paste and duplicate shaders and materials using the Outliner and LookdevX tool sets, with the option to include or exclude connections.
  • Seamless color management: LookdevX in Maya integrates with color managers in other digital content creation apps to ensure accurate color representation. Color management data is precisely embedded in USD files for accurate reading.
  • Advanced graphing: Users can explore advanced graphing options with the integrated component workflow, supporting multichannel Extended Dynamic Range (EXR) workflows within USD, MaterialX or Arnold shading graphs.
  • Efficient troubleshooting: Solo nodes enable faster look-development workflows and efficient graph troubleshooting. Users can inspect renders of upstream nodes, supporting both Autodesk Arnold and MaterialX graphs, including materials, shaders and compounds.

Access to default prim options in Maya UI

Get Plugged Into the World of OpenUSD

Anyone can build their own Omniverse extension or Connector to enhance their 3D workflows and tools. Explore the Omniverse ecosystem’s growing catalog of connections, extensions, foundation applications and third-party tools.

Autodesk and NVIDIA are founding members of the Alliance for OpenUSD (AOUSD), together strengthening an open future with USD. To learn more, explore the AOUSD forum and check out resources on OpenUSD.

Share your Autodesk Maya and Omniverse work through November as part of the #SeasonalArtChallenge. Use the hashtag to submit an autumn harvest-themed scene for a chance to be featured on the @NVIDIAStudio and @NVIDIAOmniverse social channels.

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team

Developers can check out these Omniverse resources to begin building on the platform. 

Stay up to date on the platform by subscribing to the newsletter and following NVIDIA Omniverse on Instagram, LinkedIn, Medium, Threads and Twitter.

For more, check out our forums, Discord server, Twitch and YouTube channels..

Read More

More Games, More Wins: PC Game Pass Included With Six-Month GeForce NOW Memberships

More Games, More Wins: PC Game Pass Included With Six-Month GeForce NOW Memberships

The fastest way to give the gift of cloud gaming starts this GFN Thursday: For a limited time, every six-month GeForce NOW Ultimate membership includes three months of PC Game Pass.

Also, the newest GeForce NOW app update is rolling out to members, including Xbox Game Syncing and more improvements.

Plus, take advantage of a heroic, new members-only Guild Wars 2 reward. It’s all topped off by support for 18 more games in the GeForce NOW library this week.

Give the Gift of Gaming

PC Game Pass bundle
Pair PC Game Pass with a GeForce NOW Ultimate bundle for the ultimate gaming gift.

Unwrap the gift of gaming: For a limited time, gamers who sign up for the six-month GeForce NOW Ultimate membership will also receive three free months of PC Game Pass — a $30 value.

With it, Ultimate members can play a collection of high-quality Xbox PC titles with the power of a GeForce RTX 4080 rig in the cloud. Jump into the action in iconic franchises like Age of Empires, DOOM, Forza and more, with support for more titles added every GFN Thursday.

Seamlessly launch supported favorites across nearly any device at up to 4K and 120 frames per second or at up to 240 fps with NVIDIA Reflex technology in supported titles for lowest-latency streaming.

This special offer is only here for a limited time, so upgrade today.

Sync’d Up

Xbox and Ubisoft+ game library sync
Look who just joined the party!

With so many games ready to stream, it might be hard to decide what to play next. The latest GeForce NOW app update, currently rolling out to members, is here to help.

Members can now connect their Xbox accounts to GeForce NOW to sync the games they own to their GeForce NOW library. Game syncing lets members connect their digital game store accounts to GeForce NOW, so all of their supported games are part of their streaming library. Syncing an Xbox account will also add any supported titles a member has access to via PC Game Pass — perfect for members taking advantage of the latest Ultimate bundle.

The new update also adds benefits for Ubisoft+ subscribers. With a linked Ubisoft+ account, members can now launch supported Ubisoft+ games they already own from the GeForce NOW app, and the game will be automatically added to “My Library.” Get more details on Ubisoft account linking.

Version 2.0.58 also includes an expansion of the new game session diagnostic tools to help members ensure they’re streaming at optimal quality. It adds codec information to the in-stream statistics overlay and includes other miscellaneous bug fixes. The update should be available for all members soon.

A Heroic Offering

Guild Wars 2 reward on GeForce NOW
Rewards fit for a hero.

This week, members can receive Guild Wars 2 “Heroic Edition,” which includes a treasure trove of goodies, such as the base game, Legacy Armor, an 18-slot inventory expansion and four heroic Boosters. It’s the perfect way to jump into ArenaNet’s critically acclaimed, free-to-play, massively multiplayer online role-playing game.

It’s easy to get membership rewards for streaming games on the cloud. Visit the GeForce NOW Rewards portal and update the settings to receive special offers and in-game goodies.

Members can also sign up for the GeForce NOW newsletter, which includes reward notifications, by logging into their NVIDIA account and selecting “Preferences” from the header. Check the “Gaming & Entertainment” box and “GeForce NOW” under topic preferences.

Ready, Set, Go

Remnant II DLC on GeForce NOW
A new DLC awakens.

The first downloadable content for Gearbox’s Remnant 2 arrives in the cloud. The Awakened King brings a new storyline, area, archetype and more to the dark fantasy co-op shooter — stream it today to experience the awakening of the One True King as he seeks revenge against all who oppose him.

Catch even more action with the 18 newly supported games in the cloud:

  • Spirittea (New release on Steam, Nov. 13)
  • KarmaZoo (New release on Steam, Nov. 14)
  • Naheulbeuk’s Dungeon Master (New release on Steam, Nov. 15)
  • Warhammer Age of Sigmar: Realms of Ruin (New release on Steam, Nov. 17)
  • Arcana of Paradise —The Tower (Steam)
  • Blazing Sails: Pirate Battle Royale (Epic Games Store)
  • Disney Dreamlight Valley (Xbox, available on PC Game Pass)
  • Hello Neighbor 2 (Xbox, available on PC Game Pass)
  • Overcooked! 2 (Xbox, available on PC Game Pass)
  • RoboCop: Rogue City (New release on Epic Games Store)
  • Roboquest (Xbox, available on PC Game Pass)
  • Rune Factory 4 Special (Xbox and available on PC Game Pass)
  • Settlement Survival (Steam)
  • SOULVARS (Steam)
  • State of Decay: Year-One Survival Edition (Steam)
  • The Wonderful One: After School Hero (Steam)
  • Wolfenstein: The New Order (Xbox, available on PC Game Pass)
  • Wolfenstein: The Old Blood (Steam, Epic Games Store, Xbox and available on PC Game Pass)

What are you looking forward to streaming? Let us know on Twitter or in the comments below.

Read More

Accelerating Generative AI with PyTorch: Segment Anything, Fast

Accelerating Generative AI with PyTorch: Segment Anything, Fast

This post is the first part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples of how these features can be combined to see how far we can push PyTorch native performance.

As announced during the PyTorch Developer Conference 2023, the PyTorch team rewrote Meta’s Segment Anything (“SAM”) Model resulting in 8x faster code than the original implementation, with no loss of accuracy, all using native PyTorch optimizations. We leverage a breadth of new PyTorch features:

  • Torch.compile: A compiler for PyTorch models
  • GPU quantization: Accelerate models with reduced precision operations
  • Scaled Dot Product Attention (SDPA): Memory efficient attention implementations
  • Semi-Structured (2:4) Sparsity: A GPU optimized sparse memory format
  • Nested Tensor: Batch together non-uniformly sized data into a single Tensor, such as images of different sizes.
  • Custom operators with Triton: Write GPU operations using Triton Python DSL and easily integrate it into PyTorch’s various components with custom operator registration.

We encourage readers to copy-paste code from our implementation of SAM on Github and ask us questions on Github.

A quick glimpse of increasing throughput and decreasing memory overhead

A quick glimpse of increasing throughput and decreasing memory overhead with our newly released, PyTorch native, features. Benchmarks run on p4d.24xlarge instance (8x A100s).

SegmentAnything Model

SAM is a zero-shot vision model for generating promptable image masks.

sam image masks

The SAM architecture [described in its paper] includes multiple prompt and image encoders based on the Transformer architecture. Of this, we measured performance across the smallest and largest vision transformer backbones: ViT-B and ViT-H. And for simplicity, we only show traces for the ViT-B model.

Optimizations

Below we tell the story of optimizing SAM: profiling, identifying bottlenecks, and building new features into PyTorch that solve these problems. Throughout, we showcase our new PyTorch features: torch.compile, SDPA, Triton kernels, Nested Tensor and semi-structured sparsity. The following sections are progressively built upon each other, ending with our SAM-fast, now available on Github. We motivate each feature using real kernel and memory traces, using fully PyTorch native tooling, and visualize these traces with Perfetto UI.

Baseline

Our SAM baseline is Facebook Research’s unmodified model, using float32 dtype and a batch size of 1. After some initial warmup, we can look at a kernel trace using the PyTorch Profiler:

kernel trace

We notice two areas ripe for optimization.

The first is long calls to aten::index, the underlying call resulting from a Tensor index operation (e.g., []). While the actual GPU time spent on aten::index is relatively low. aten::index is launching two kernels, and a blocking cudaStreamSynchronize is happening in between. This means the CPU is waiting for the GPU to finish processing until it launches the second kernel. To optimize SAM, we should aim to remove blocking GPU syncs causing idle time.

The second is significant time spent on GPU in matrix multiplication (dark green on stream 7 7 above). This is common in Transformers. We can significantly speed up SAM if we can reduce the amount of GPU time spent on matrix multiplication.

We can measure the throughput (img/s) and memory overhead (GiB) from out of the box SAM to establish a baseline:

throughput (img/s) and memory overhead (GiB) from out of the box SAM

Bfloat16 Half precision (+GPU syncs and batching)

To address the first issue of less time spent in matrix multiplication, we can turn to bfloat16. Bfloat16 is a commonly used half-precision type. Through less precision per parameter and activations, we can save significant time and memory in computation. With reducing precision of parameters, it’s critical to validate end to end model accuracy.

replacing padding dtypes with half precision, bfloat16

Shown here is an example of replacing padding dtypes with half precision, bfloat16. Code is here.

Next to simply setting model.to(torch.bfloat16) we have to change a few small places that assume the default dtype.

Now, in order to remove GPU syncs we need to audit operations that cause them. We can find these pieces of code by searching the GPU traces for calls to cudaStreamSynchronize. In fact, we found two locations that we were able to rewrite to be sync-free.

code sample 1

replacing padding dtypes with half precision, bfloat16

Specifically, we see that within SAM’s image encoder, there are variables acting as coordinate scalers, q_coords and k_coords. These are both allocated and processed on the CPU. However, once these variables are used to index in rel_pos_resized, the index operation automatically moves these variables to the GPU. This copy over causes the GPU sync we’ve observed above. We notice a second call to index in SAM’s prompt encoder: We can use torch.where to rewrite this as shown above.

Kernel trace

After applying these changes, we begin to see significant time between individual kernel calls. This is typically observed with small batch sizes (1 here) due to the GPU overhead of launching kernels. To get a closer look at practical areas for optimization, we can start to profile SAM inference with batch size 8:

profile SAM inference with batch size 8

Looking at the time spent per-kernel, we obverse most of SAM’s GPU time spent on elementwise kernels and softmax operation. With this we now see that matrix multiplications have become a much smaller relative overhead.

matrix multiplications have become a much smaller relative overhead

Taken the GPU sync and bfloat16 optimizations together, we have now pushed SAM performance by up to 3x

SAM performance by up to 3x

Torch.compile (+graph breaks and CUDA graphs)

When observing a large number of small operations, such as the elementwise kernels profiled above, turning to a compiler to fuse operations can have strong benefits. PyTorch’s recently released torch.compile does a great job optimizing by:

  1. Fusing together sequences of operations such as nn.LayerNorm or nn.GELU into a single GPU kernel that is called and
  2. Epilogues: fusing operations that immediately follow matrix multiplication kernels to reduce the number of GPU kernel calls.

Through these optimizations, we reduce the number of GPU global memory roundtrips, thus speeding up inference. We can now try torch.compile on SAM’s image encoder. To maximize performance we use a few advanced compile techniques such as:

  • using torch.compile’s max-autotune mode enables CUDA graphs and shape-specific kernels with custom epilogues
  • By setting TORCH_LOGS=”graph_breaks,recompiles” we can manually verify that we are not running into graph breaks or recompiles.
  • Padding the batch of images input to the encoder with zeros ensures compile accepts static shapes thus being able to always use shape-specific optimized kernels with custom epilogues without recompilations.
predictor.model.image_encoder = 
    torch.compile(predictor.model.image_encoder, mode=use_compile)

Kernel trace

Kernel trace

torch.compile is working beautifully. We launch a single CUDA graph, which makes up a significant portion of GPU time within the timed region. Let’s run our profile again and look at the percentage of GPU time spent in specific kernels:

the percentage of GPU time spent in specific kernels

We now see softmax makes up a significant portion of the time followed by various GEMM variants. In summary we observe the following measurements for batch size 8 and above changes.

measurements for batch size 8 and above

SDPA: scaled_dot_product_attention

Next up, we can tackle one of the most common areas for transformer performance overhead: the attention mechanism. Naive attention implementations scale quadratically in time and memory with sequence length. PyTorch’s scaled_dot_product_attention operation built upon the principles of Flash Attention, FlashAttentionV2 and xFormer’s memory efficient attention can significantly speed up GPU attention. Combined with torch.compile, this operation allows us to express and fuse a common pattern within variants of MultiheadAttention. After a small set of changes we can adapt the model to use scaled_dot_product_attention.

PyTorch native attention implementation

PyTorch native attention implementation, see code here.

Kernel trace

We can now see that in particular the memory efficient attention kernel is taking up a large amount of computational time on the GPU:

memory efficient attention kernel is taking up a large amount of computational time on the GPU

Using PyTorch’s native scaled_dot_product_attention, we can significantly increase the batch size. We now observe the following measurements for batch size 32 and above changes.

batch size 32 and above

Triton: Custom SDPA for fused relative positional encoding

Transitioning away from inference throughput for a moment, we started profiling overall SAM memory. Within the image encoder, we saw significant spikes in memory allocation:

spikes in memory allocation

Zooming in, we see this allocation happens within add_decomposed_rel_pos, on the following line:

we see this allocation happens within add_decomposed_rel_pos

The attn variable here is the addition of two smaller tensors: rel_h of shape (B, q_h, q_w, k_h, 1) and rel_w of shape (B, q_h, q_w, 1, k_w).

It’s not surprising that the memory efficient attention kernel (used via SDPA) is taking a long time with an attention bias size over 3.0GiB. If instead of allocating this large attn tensor, we thread into SDPA the two smaller rel_h and rel_w tensors, and only construct attn as needed, we’d anticipate significant performance gain.

Unfortunately this is not a trivial modification; SDPA kernels are highly optimized and written in CUDA. We can turn to Triton, with their easy to understand and use tutorial on a FlashAttention implementation. After some significant digging and in close collaboration with xFormer’s Daniel Haziza we found one case of input shapes where it is relatively straightforward to implement a fused version of the kernel. The details have been added to the repository. Surprisingly this can be done in under 350 lines of code for the inference case.

This is a great example of extending PyTorch with a new kernel, straightforwardly built with Triton code.

Kernel trace

kernel trace

With our custom positional Triton kernel we observe the following measurements for batch size 32.

we observe the following measurements for batch size 32

NT: NestedTensor and batching predict_torch

We have spent a lot of time on the image encoder. This makes sense, since it takes up the most amount of computational time. At this point however it is fairly well optimized and the operator that takes the most time would require significant additional investment to be improved.

We discovered an interesting observation with the mask prediction pipeline: for each image we have there is an associated size, coords, and fg_labels Tensor. Each of these tensors are of different batch sizes. Each image itself is also of a different size. This representation of data looks like Jagged Data. With PyTorch’s recently released NestedTensor, we can modify our data pipeline batch coords and fg_labels Tensors into a single NestedTensor. This can have significant performance benefits for the prompt encoder and mask decoder that follow the image encoder. Invoking:

torch.nested.nested_tensor(data, dtype=dtype, layout=torch.jagged)

Kernel trace

Kernel trace

we can launch kernels much faster from the CPU than the GPU can process

We can see now that we can launch kernels much faster from the CPU than the GPU can process and that it spends a long time waiting at the end of our timed region for the GPU to finish (cudaDeviceSynchronize). We also don’t see any more idle time (white space) between kernels on the GPU.

With Nested Tensor, we observe the following measurements for batch size 32 and above changes.

batch size 32 and above changes

int8: quantization and approximating matmul

We notice in the above trace, that significant time is now spent in GEMM kernels. We’ve optimized enough that we now see matrix multiplication account for more time in inference than scaled dot product attention.

Building on earlier learnings going from fp32 to bfloat16, let’s go a step further, emulating even lower precision with int8 quantization. Looking at quantization methods, we focus on Dynamic quantization wherein our model observes the range of possible inputs and weights of a layer, and subdivides the expressible int8 range to uniformly “spread out” observed values. Ultimately each float input will be mapped to a single integer in the range [-128, 127]. For more information see PyTorch’s tutorial on quantization

Reducing precision can immediately lead to peak memory savings, but to realize inference speedups, we have to make full use of int8 through SAM’s operations. This requires building an efficient int8@int8 matrix multiplication kernel, as well as casting logic to translate from high to low precision (quantization) as well as reversing back from low to high (dequantization). Utilizing the power of torch.compile, we can compile and fuse together these quantization and dequantization routines into efficient single kernels and epilogues of our matrix multiplication. The resulting implementation is fairly short and less than 250 lines of code. For more information on the APIs and usage, see pytorch-labs/ao.

While it’s common to see some accuracy regression when quantizing models at inference time, SAM has been particularly robust to lower precision inference with minimal loss of accuracy. With quantization added, we now observe the following measurements for batch size 32 and above changes.

batch size 32 and above changes

sparse: Semi-structured (2:4) sparsity

Matrix multiplications are still our bottleneck. We can turn to the model acceleration playbook with another classic method to approximate matrix multiplication: sparsification. By sparsifying our matrices (i.e., zeroing out values), we could theoretically use fewer bits to store weight and activation tensors. The process by which we decide which weights in the tensor to set to zero is called pruning. The idea behind pruning is that small weights in a weight tensor contribute little to the net output of a layer, typically the product of weights with activations. Pruning away small weights can potentially reduce model size without significant loss of accuracy.

Methods for pruning are varied, from completely unstructured, wherein weights are greedily pruned to highly structured, wherein large sub-components of a tensor are pruned a time. Choice of method is not trivial. While unstructured pruning may have the theoretically least impact on accuracy, GPUs are also highly efficient with multiplying large, dense matrices and may suffer significant performance degradation in sparse regimes. One recent pruning method supported in PyTorch seeks to strike a balance, called semi-structured (or 2:4) sparsity. This sparse storage reduces the original tensor by a significant 50%, while simultaneously resulting in a dense tensor output that can leverage highly performant, 2:4 GPU kernels. See the following picture for an illustration.

dense tensor output that can leverage highly performant, 2:4 GPU kernels

From developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt

In order to use this sparse storage format and the associated fast kernels we need to prune our weights such that they adhere to the constraints for the format. We pick the two smallest weights to prune in a 1 by 4 region, measuring the performance vs accuracy tradeoff. It is easy to change a weight from its default PyTorch (“strided”) layout to this new, semi-structured sparse layout. To implement apply_sparse(model) we only require 32 lines of Python code:

import torch
from torch.sparse import to_sparse_semi_structured, SparseSemiStructuredTensor

# Sparsity helper functions
def apply_fake_sparsity(model):
    """
    This function simulates 2:4 sparsity on all linear layers in a model.
    It uses the torch.ao.pruning flow.
    """
    # torch.ao.pruning flow
    from torch.ao.pruning import WeightNormSparsifier
    sparse_config = []
    for name, mod in model.named_modules():
        if isinstance(mod, torch.nn.Linear):
            sparse_config.append({"tensor_fqn": f"{name}.weight"})

    sparsifier = WeightNormSparsifier(sparsity_level=1.0,
                                      sparse_block_shape=(1,4),
                                      zeros_per_block=2)
    sparsifier.prepare(model, sparse_config)
    sparsifier.step()

    sparsifier.step()
    sparsifier.squash_mask()


def apply_sparse(model):
    apply_fake_sparsity(model)
    for name, mod in model.named_modules():
        if isinstance(mod, torch.nn.Linear):
            mod.weight = torch.nn.Parameter(to_sparse_semi_structured(mod.weight))

With 2:4 sparsity, we observe peak performance on SAM with vit_b and batch size 32:

With 2:4 sparsity, we observe peak performance on SAM with vit_b and batch size 32

Conclusion

Wrapping up, we are excited to have announced our fastest implementation of Segment Anything to date. We rewrote Meta’s original SAM in pure PyTorch with no loss of accuracy using a breadth of newly released features:

  • Torch.compile PyTorch’s native JIT compiler, providing fast, automated fusion of PyTorch operations [tutorial]
  • GPU quantization accelerate models with reduced precision operations [api]
  • Scaled Dot Product Attention (SDPA) a new, memory efficient implementation of Attention [tutorial]
  • Semi-Structured (2:4) Sparsity accelerate models with fewer bits to store weights and activations [tutorial]
  • Nested Tensor Highly optimized, ragged array handling for non-uniform batch and image sizes [tutorial]
  • Triton kernels. Custom GPU operations, easily built and optimized via Triton

For more details on how to reproduce the data presented in this blog post, check out the experiments folder of segment-anything-fast. Please don’t hesitate to contact us or open an issue if you run into any technical issues.

In our next post, we are excited to share similar performance gains with our PyTorch natively authored LLM!

Acknowledgements

We would like to thank Meta’s xFormers team including Daniel Haziza and Francisco Massa for authoring SDPA kernels and helping us design our custom one-off Triton kernel.

Read More

🎉 PyTorch Docathon H2 2023 Wrap-up 🎉

We are thrilled to announce the successful completion of the Fall 2023 PyTorch Docathon! The event was a resounding success, and we want to extend our heartfelt gratitude to all the participants who made it possible. Dedication, expertise, and tireless efforts of our open-source contributors have once again helped us to improve PyTorch documentation.

This Docathon ran from Nov 1 through Nov 15 with more than 170 registrants. The energy and enthusiasm were palpable, and entrants were judged on the difficulty of submissions that resulted in over TBA merged pull requests. We have fixed the PyTorch docstrings and made them compatible with the PEP 257 Python Docstring Conventions guidelines. We also have fixed multiple bugs in the pytorch/tutorials repo.

We want to give a special shout-out to our top contributors, who went above and beyond during this event. Your dedication and expertise have been invaluable in enhancing the PyTorch documentation and empowering developers worldwide.

Meet the top contributors:

You can see the full docathon leaderboard published here.

As we bring this Docathon to a close, we encourage each and every one of you to stay inspired and keep contributing to PyTorch documentation and code, and pushing the boundaries of what’s possible with PyTorch. Your collective efforts are shaping the landscape of deep learning and fostering innovation in the PyTorch community.

Thank you again for your participation and support. We look forward to seeing what you will achieve next!

Team PyTorch

Read More