More than 800 students from over 100 universities around the world joined NVIDIA as the first class of our virtual internship program — I’m one of them, working on the corporate communications team this summer.
Shortly after the pandemic’s onset, NVIDIA decided to reinvent its internship program as a virtual one. We’ve been gaining valuable experience and having a lot of fun — all through our screens.
Fellow interns have contributed ideas to teams ranging from robotics to financial reporting. I’ve been writing stories on how cutting-edge tech improves industries from healthcare to animation, learning to work the backend of the company newsroom, and fostering close relationships with some fabulous colleagues.
And did I mention fun? Game show and cook-along events, a well-being panel series and gatherings such as book clubs were part of the programming. We also had several swag bags sent to our doorsteps, which included a customized intern company sweatshirt and an NVIDIA SHIELD TV.
Meet a few other interns who joined the NVIDIA family this year:
Amevor Aids Artists by Using Deep Learning
Christoph Amevor just graduated with a bachelor’s in computational sciences and engineering from ETH Zurich in Switzerland.
At NVIDIA, he’s working on a variety of deep learning projects including one to simplify the workflow of artists and creators using NVIDIA Omniverse, a real-time simulation platform for 3D production pipelines.
“Machine learning is such a powerful tool, and I’ve been interested in seeing how it can help us solve problems that are simply too complex to tackle with analytic math,” Amevor said.
He lives with another NVIDIA intern, which he said has made working from home feel like a mini company location.
Santos Shows Robots the Ropes
Beatriz Santos is an undergrad at California State University, East Bay, studying computer science. She’s a software intern working on the Isaac platform for robotics.
Though the pandemic has forced her to social distance from other humans, Santos has been spending a lot of time with the robot Kaya, in simulation, training it to do various tasks.
Her favorite virtual event this summer was the women’s community panel featuring female leaders at NVIDIA.
“I loved their inputs on working in a historically male-dominated field, and how they said we don’t have to change because of that,” she said. “We can just be ourselves, be girls.”
Sindelar Sharpens Websites
When researching potential summer internships, Justin Sindelar — a marketing major at San Jose State University — was immediately drawn to NVIDIA’s.
“The NVIDIA I once knew as a consumer graphics card company has grown into a multifaceted powerhouse that serves several high-tech industries and has contributed to the proliferation of AI,” he said.
Using the skills he’s learned at school and as a web designer, Sindelar has been performing UX analyses to help improve NVIDIA websites and their accessibility features.
His favorite intern activity was the game show event where he teamed up with his manager and mentors in the digital marketing group to answer trivia questions and fill in movie quotes.
Zhang Zaps Apps Into Shape
Maggie Zhang is a third-year biomedical engineering student at the University of Waterloo in Ontario. She works on the hardware infrastructure team to make software applications that improve workflow for hardware engineers.
When not coding or testing a program, she’s enjoyed online coffee chats, where she formed an especially tight bond with other Canadian interns.
She also highlighted how thankful she is for her team lead and mentor, who set up frequent one-on-one check-ins and taught her new concepts to improve code and make programs more manageable.
“They’ve taught me to be brave, experiment and learn as I go,” she said. “It’s more about what you learn than what you already know.”
For many interns, this fulfilling and challenging summer will lead to future roles at NVIDIA.
Spotting a meteor flash across the sky is a rare event for most people, unless you’re the operators of the CAMS meteor shower surveillance project, who frequently spot more than a thousand in a single night and recently discovered two new showers.
CAMS, which stands for Cameras for Allsky Meteor Surveillance, was founded in 2010. Since 2017, it’s been improved by researchers using AI at the Frontier Development Lab, in partnership with NASA and the SETI Institute.
The project uses AI to identify whether a point of light moving in the night sky is a bird, plane, satellite or, in fact, a meteor. The CAMS network consists of cameras that take pictures of the sky, at a rate of 60 frames per second.
The AI pipeline also verifies the findings to confirm the direction from which meteoroids, small pieces of comets that cause meteors, approach the Earth. The project’s AI model training is optimized on NVIDIA TITAN GPUs housed at the SETI Institute.
Each night’s meteor sightings are then mapped onto the NASA meteor shower portal, a visualization tool available to the public. All meteor showers identified since 2010 are available on the portal.
CAMS detected two new meteor showers in mid-May, called the gamma Piscis Austrinids and the sigma Phoenicids. They were added to the International Astronomical Union’s meteor data center, which has recorded 1,041 unique meteor showers to date.
Analysis found both showers to be caused by meteoroids from long-period comets, which take more than 200 years to complete an orbit around the sun.
Improving the Meteor Classification Process
Peter Jenniskens, principal investigator for CAMS, has been classifying meteors since he founded the project in 2010. Before having access to NVIDIA’s GPUs, Jenniskens would look at the images these cameras collected and judge by eye if a light curve from a surveyed object fit the categorization for a meteor.
Now, the CAMS pipeline is entirely automated, from the transferring of data from an observatory to the SETI Institute’s server, to analyzing the findings and displaying them on the online portal on a nightly basis.
With the help of AI, researchers have been able to expand the project and focus on its real-world impact, said Siddha Ganju, a solutions architect at NVIDIA and member of FDL’s AI technical steering committee who worked on the CAMS project.
“The goal of studying space is to figure out the unknowns of the unknowns,” said Ganju. “We want to know what we aren’t yet able to know. Access to data, instruments and computational power is the holy trifecta available today to make discoveries that would’ve been impossible 50 years ago.”
Public excitement around the CAMS network has spurred it to expand the number of cameras fourfold since the project began incorporating AI in 2017. With stations all over the world, from Namibia to the Netherlands, the project now hunts for one-hour long meteor showers, which are only visible in a small part of the world at a given time.
Applying the Information Gathered
The AI model, upon identifying a meteor, calculates the direction it’s coming from. According to Jenniskens, meteors come in groups, called meteoroid streams, which are mostly caused by comets. A comet can approach from as far as Jupiter or Saturn, he said, and when it’s that far away, it’s impossible to see until it comes closer to Earth.
The project’s goal is to enable astronomers to look along the path of an approaching comet and provide enough time to figure out the potential impact it may have on Earth.
Mapping out all discoverable meteor showers brings us a step closer to figuring out what the entire solar system looks like, said Ganju, which is crucial to identifying the potential dangers of comets.
But this map, NASA’s meteor shower portal, isn’t just for professional use. The visualization tool was made available online with the goal of “democratizing science for citizens and fostering interest in the project,” according to Ganju. Anyone can use it to find out what meteor showers are visible each night.
Hugging Face is more than just an adorable emoji — it’s a company that’s demystifying AI by transforming the latest developments in deep learning into usable code for businesses and researchers.
Research engineer Sam Shleifer spoke with AI Podcast host Noah Kravitz about Hugging Face NLP technology, which is in use at over 1,000 companies, including Apple, Bing and Grammarly, across fields ranging from finance to medical technology.
Hugging Face’s models serve a variety of purposes for their customers, including autocompletion, customer service automation and translation. Their popular web application, Write with Transformer, can even take half-formed thoughts and suggest options for completion.
Schleifer is currently at work developing models that are accessible to everyone, whether they are proficient coders or not.
In the next few years, Schleifer envisions the continued growth of smaller NLP models that power a wave of chat apps with state-of-the-art translation capabilities.
Key Points From This Episode:
Hugging Face first launched an original chatbot app, before moving into natural language processing models. The move was well-received, and last year the company announced a $15 million funding round.
The company is a member of NVIDIA Inception, a virtual accelerator that Schleifer credits with significantly accelerating their experiments.
Hugging Face has released over 1,000 models trained with unsupervised learning and the Open Parallel Corpus project, pioneered by the University of Helsinki. These models are capable of machine translation in a huge variety of languages, even for low-resource languages with minimal training data.
“We’re trying to make state-of-the-art NLP accessible to everyone who wants to use it, whether they can code or not code.” — Sam Shleifer [1:44]
“Our research is targeted at this NLP accessibility mission — and NLP isn’t really accessible when models can’t fit on a single GPU.” — Sam Shleifer [10:38]
Dr. Pushpak Bhattacharyya’s work is giving computers the ability to understand one of humanity’s most challenging, and amusing, modes of communication. Bhattacharyya, director of IIT Patna, and a professor at the Computer Science and Engineering Department at IIT Bombay, has spent the past few years using GPU-powered deep learning to detect sarcasm.
At Oracle, customer service chatbots use conversational AI to respond to users with more speed and complexity. Suhas Uliyar, vice president of bots, AI and mobile product management at Oracle, talks about how the newest wave of conversational AI can keep up with the nuances of human conversation.
Syed Ahmed, a research assistant at the National Technical Institute for the Deaf, is directing the power of AI toward another form of communication: American Sign Language. And what Ahmed has done is set up a deep learning model that translates ASL into English.
Whether they’re tackling challenges at the cutting edge of physics, trying to tame a worldwide pandemic, or sorting their child’s Lego collection, innovators join NVIDIA’s developer program to help them solve their most challenging problems.
With the number of registered NVIDIA developers having just hit 2 million, NVIDIA developers are pursuing more breakthroughs than ever.
Their ranks continue to grow by larger numbers every year. It took 13 years to reach 1 million registered developers, and less than two more to reach 2 million.
Most recently, teams at the U.S. National Institutes of Health, Scripps Research Institute and Oak Ridge National Laboratory have been among the NVIDIA developers at the forefront of efforts to combat COVID-19.
Every Country, Every Field
No surprise. Whether they’re software programmers, data scientists or devops engineers, developers are problem solvers.
They write, debug and optimize code, often taking a set of software building blocks — frameworks, application programming interfaces and other tools — and putting them to work to do something new.
These developers include business and academic leaders from every region in the world.
In China, Alibaba and Baidu are among the most active GPU developers. In North America, those names include Microsoft, Amazon and Google. In Japan, it’s Sony, Hitachi and Panasonic. In Europe, they include Bosch, Daimler and Siemens.
All the top technical universities are represented, including CalTech, MIT, Oxford, Cambridge, Stanford, Tsinghua University, the University of Tokyo, and IIT campuses throughout India.
Look beyond the big names — there are too many to drop here — and you’ll find tens of thousands of entrepreneurs, hobbyists and enthusiasts.
Developers are signing up for our developer program to put NVIDIA accelerated computing tools to work across fields such as scientific and high performance computing, graphics and professional visualization, robotics, AI and data science, networking, and autonomous vehicles.
Registered developers account for 100,000 downloads a month, thousands participate each month in DLI training sessions, and thousands more engage in our online forums or attend conferences and webinars.
NVIDIA’s developer program, however, is just a piece of a much bigger developer story. There are now more than a billion CUDA GPUs in the world — each capable of running CUDA-accelerated software — giving developers, hackers and makers a vast installed base to work with.
As a result, the number of downloads of CUDA, which is free, without registration, is far higher than that of registered developers. On average, 39,000 developers sign up for memberships each month and 438,000 download CUDA each month.
More scientific breakthroughs are coming, as developers attack new HPC problems and, increasingly, deep learning.
William Tang, principal research physicist at the Princeton Plasma Physics Laboratory — one of the world’s foremost experts on fusion energy — leads a team using deep learning and HPC to advance the quest for cheap, clean energy.
Michael Kirk and Raphael Attie, scientists at NASA’s Goddard Space Flight Center — are among the many active GPU developers at NASA — relying on Quadro RTX data science workstations to analyze the vast quantities of data streaming in from satellites monitoring the sun.
And at UC Berkeley, astrophysics Ph.D. student Gerry Zhang uses GPU-accelerated deep learning to analyze signals from space for signs of intelligent extraterrestrial civilizations.
Outside of research and academia, developers at the world’s top companies are tackling problems faced by every one of the world’s industries.
At Intuit, Chief Data Officer Ashok Srivastava leads a team using GPU-accelerated machine learning to help consumers with taxes and help small businesses through the financial effects of COVID-19.
Arne Stoschek, head of autonomous systems at Acubed, the Silicon Valley-based advanced products and partnerships outpost of Airbus Group, is developing self-piloted air taxis powered by GPU-accelerated AI.
New Problems, New Businesses: Entrepreneurs Swell Developer Ranks
Other developers — many supported by the NVIDIA Inception program — work at startups building businesses that solve new kinds of problems.
Most telling: stories from developers working at the cutting edge of the arts.
Pierre Barreau has created an AI, named AIVA, which uses mathematical models based on the work of great composers to create new music.
And Raiders of the Lost Art — a collaboration between Anthony Bourached and George Cann, a pair of Ph.D. candidates at the University College, London — has used neural style transfer techniques to tease out hidden artwork in a Leonardo da Vinci painting.
Wherever you go, follow the computing power and you’ll find developers delivering breakthroughs.
How big is the opportunity for problem solvers like these? However many problems there are in the world.
Want more stories like these? No problem. Over the months to come, we’ll be bringing as many to you as we can.
With today’s beta launch on ChromeOS, Chromebooks now wield the power to play PC games using GeForce NOW.
Chromebook users join the millions on PC, Mac, SHIELD and Android mobile devices already playing their favorite games on our cloud gaming service with GeForce performance.
Getting started is simple. Head to play.geforcenow.com and log in with your GeForce NOW account. Signing up is easy, just choose either a paid Founders membership or a free account.
Right now is a great time to join. We just launched a six-month Founders membership that includes a Hyper Scape Season One Battle Pass token and exclusive Hyper Scape in-game content for $24.95. That’s a $64.94 value.
Once logged in, you’re only a couple clicks away from streaming a massive catalog of games. For the best experience, you’ll want to make those clicks with a USB mouse.
Distance Learning by Day, Distance Gaming by Night
Some students are heading back to school. Others are distance learning from home. However they’re learning, more students than ever rely on Chromebooks.
That’s because Chromebooks are great computers for studying. They’re fast, simple and secure devices that help you stay productive and connected.
Now, those same Chromebooks transform, instantly, into GeForce-powered distance gaming rigs, thanks to GeForce NOW.
Your Games on All Your Devices
Millions of GeForce NOW members play with and against their friends — no matter which platform they’re streaming on, whether that’s PC, Mac, Android or, now, Chromebooks.
That’s because when you stream games using GeForce NOW, you’re playing the PC version from digital stores like Steam, Epic Games Store and Ubisoft Uplay.
This is great for developers, who can bring their games to the cloud at launch, without adding development cycles.
And it’s great for the millions of GeForce NOW members. They’re tapping into an existing ecosystem anytime they stream one of more than 650 games instantly. That includes over 70 of the most-played free-to-play games.
When games like CD Projekt Red’s Cyberpunk 2077 come out later this year, members will be able to play using GeForce NOW servers the same day on their Chromebook.
Anywhere You Go
Chromebooks, of course, are lightweight devices that go where you do. From home to work to school. Or from your bedroom to the living room.
GeForce NOW is the perfect Chromebook companion. Simply plug in a mouse and go. Our beta release gives Chromebook owners the power to play their favorite PC games.
Take game progress or character level-ups from a desktop to a phone and then onto Chromebook. You’re playing the games you own from your digital game store accounts. So your progress goes with you.
More PC Gaming Features Heading to the Cloud
The heart of GeForce NOW is PC gaming. We continue to tap into the PC ecosystem to bring more PC features to the cloud.
PC gamers are intimately familiar with Steam. Many have massive libraries from the popular PC game store. To support them, we just launched Steam Game Sync so they can sync games from their Steam library with their library in GeForce NOW. It’s quickly become one of our most popular features for members playing on PC and Mac.
Soon, Chromebook owners will be able to take advantage of the feature, too.
Over the past few months, we’ve added two GeForce Experience features. Highlights delivers automatic video capture so you can share your best moments, and Freestyle provides gamers the ability to customize a game’s look. In the weeks ahead, we’ll add support for Ansel — a powerful in-game camera that lets gamers capture professional-grade screenshots. These features are currently only available on PC and Mac. Look for them to come to Chromebooks in future updates.
More games. More platforms. Legendary GeForce performance. And now on Chromebooks. That’s the power to play that only GeForce NOW can deliver.
Alex Schepelmann went from being a teacher’s assistant for an Intro to Programming class to educating 40,000 YouTube subscribers by championing the mantra: anyone can make something super using AI and machine learning.
His YouTube channel, Super Make Something, posts two types of videos. “Basics” videos provide in-depth explanations of technologies and their methods, using fun, understandable lingo. “Project” videos let viewers follow along with instructions for creating a product.
About the Maker
Schepelmann got a B.S. and M.S. in mechanical engineering from Case Western Reserve University and a Ph.D. in robotics from Carnegie Mellon University. His master’s thesis focused on employing computer vision to identify grass and obstacles in a camera stream, and he was part of a team that created an award-winning autonomous lawnmower.
Now, he’s a technical fellow for an engineering consulting firm and an aerospace contractor supporting various robotics projects in partnership with NASA. In his free time, he creates content for his channel, based out of his home in Cleveland.
In his undergrad years, Schepelmann saw how classmates found the introductory programming class hard because the assignments didn’t relate to their everyday lives. So, when he got to teach the class as a grad student, he implemented fun projects, like coding a Tamagotchi digital pet.
His aim was to help students realize that choosing topics they’re interested in can make learning easy and enjoyable. Schepelmann later heard from one of his students, an art history major, that his class had inspired her to add a computer science minor to her degree.
“Since then, I’ve thought it was great to introduce these topics to people who might never have considered them or felt that they were too hard,” he said. “I want to show people that AI can be really fun and easy to learn. With YouTube, it’s now possible to reach an audience of any background or age range on a large scale.”
Schepelmann’s YouTube channel started as a hobby during his years at Carnegie Mellon. It’s grown to reach 2.1 million total views on videos explaining 3D printing, robotics and machine learning, including how to use the NVIDIA Jetson platform to train AI models.
His Favorite Jetson Projects
“It’s super, super easy to use the NVIDIA Jetson products,” said Schepelmann. “It’s a great machine learning platform and an inexpensive way for people to learn AI and experiment with computationally intensive applications.”
To show viewers exactly how, he’s created two Jetson-based tutorials:
Machine Learning 101: Intro to Neural Networks – Schepelmann dives into what neural networks are and walks through how to set up the NVIDIA Jetson Nano developer kit to train a neural network model from scratch.
Machine Learning 101: Naive Bayes Classifier – Schepelmann explains how the probabilistic classifier can be used for image processing and speech recognition applications, using the NVIDIA Jetson Xavier NX developer kit to demonstrate.
The creator has released the full code used in both tutorials on his GitHub site for anyone to explore.
Where to Learn More
To make something super with Super Make Something, visit Schepelmann’s YouTube channel.
NVIDIA’s enterprise partner program has grown to more than 1,500 members worldwide and added new resources to boost opportunities for training, collaboration and sales.
The expanded NVIDIA Partner Network boasts an array of companies that span the globe and help customers across a variety of needs, from high performance computing to systems integration.
The NPN has seen exponential growth over the past two years, and these new program enhancements enable future expansion as Mellanox and Cumulus partner programs are set to be integrated into NPN throughout 2021.
Mellanox and Cumulus bring strong partners into the NVIDIA fold. Focused on enterprise data center markets, they provide accelerated, disaggregated and software-defined networking to meet the rapid growth in AI, cloud and HPC.
In anticipation of this growth, the NPN has introduced educational opportunities, tools and resources for training and collaboration, as well as added sales incentives. These benefits include:
Industry-Specific Training Curriculums: New courses and enablement tools in healthcare, higher education and research, financial services and insurance, and retail. Additional courses in energy and telco are coming next year.
NPN Learning Maps: These dramatically reduce the time partners need to get up and running. Partners can discover and build their NVIDIA learning matrix based on industry and cross-referenced by role, including sales, solution architect or data scientist.
New tools and resources:
AI Consulting Network: New AI consulting services for data scientists and solution architects who are part of our Service Delivery Partner-Professional Services program to help build and deploy HPC and AI solutions.
Enhanced NPN Partner Portal: Expanded to allow access to the vast storehouse of NVIDIA-built sales tools and data, including partner rebate history and registered opportunities. The simplified portal gives partners increased visibility and easy access to the information required to quickly track sales and build accurate forecasts.
Industry-Specific Marketing Campaigns: Provides partners with the opportunity to build campaigns that more accurately target customers with content built from data-driven insights.
A fixed backend rebate for Elite-level Solution Provider and Solutions Integration partners for compute, compute DGX, visualization and virtualization.
An enhanced quarterly performance bonus program, incorporating an annualized goal to better align with sudden fluctuations in partner selling seasons.
Dedicated market development funds for Elite-level providers and integration partners for most competencies.
NPN expanded categories:
Solution advisors focused on storage solutions and mutual reference architectures
Federal government system integrators
The NVIDIA Partner Network is dedicated to supporting partners that deliver world-class products and services to customers. The NPN collaborates with hundreds of companies globally, across a range of businesses and competencies, to serve customers in HPC, AI and emerging high-growth areas such as visualization, edge computing and virtualization.
DGX SuperPODs are driving business results for companies like Continental in automotive, Lockheed Martin in aerospace and Microsoft in cloud-computing services.
Birth of an AI System
The story of how and why NVIDIA built Selene starts in 2015.
NVIDIA engineers started their first system-level design with two motivations. They wanted to build something both powerful enough to train the AI models their colleagues were building for autonomous vehicles and general purpose enough to serve the needs of any deep-learning researcher.
The result was the SATURNV cluster, born in 2016 and based on the NVIDIA Pascal GPU. When the more powerful NVIDIA Volta GPU debuted a year later, the budding systems group’s motivation and its designs expanded rapidly.
AI Jobs Grow Beyond the Accelerator
“We’re trying to anticipate what’s coming based on what we hear from researchers, building machines that serve multiple uses and have long lifetimes, packing as much processing, memory and storage as possible,” said Michael Houston, a chief architect who leads the systems team.
As early as 2017, “we were starting to see new apps drive the need for multi-node training, demanding very high-speed communications between systems and access to high-speed storage,” he said.
AI models were growing rapidly, requiring multiple GPUs to handle them. Workloads were demanding new computing styles, like model parallelism, to keep pace.
So, in fast succession, the team crafted ever larger clusters of V100-based NVIDIA DGX-2 systems, called DGX PODs. They used 32, then 64 DGX-2 nodes, culminating in a 96-node architecture dubbed the DGX SuperPOD.
They christened it Circe for the irresistible Greek goddess. It debuted in June 2019 at No. 22 on the TOP500 list of the world’s fastest supercomputers and currently holds No. 23.
Cutting Cables in a Computing Jungle
Along the way, the team learned lessons about networking, storage, power and thermals. Those learnings got baked into the latest NVIDIA DGX systems, reference architectures and today’s 280-node Selene.
In the race through ever larger clusters to get to Circe, some lessons were hard won.
“We tore everything out twice, we literally cut the cables out. It was the fastest way forward, but it still had a lot of downtime and cost. So we vowed to never do that again and set ease of expansion and incremental deployment as a fundamental design principle,” said Houston.
The team redesigned the overall network to simplify assembling the system.
They defined modules of 20 nodes connected by relatively simple “thin switches.” Each of these so-called scalable units could be laid down, cookie-cutter style, turned on and tested before the next one was added.
The design let engineers specify set lengths of cables that could be bundled together with Velcro at the factory. Racks could be labeled and mapped, radically simplifying the process of filling them with dozens of systems.
Doubling Up on InfiniBand
Early on, the team learned to split up compute, storage and management fabrics into independent planes, spreading them across more, faster network-interface cards.
The number of NICs per GPU doubled to two. So did their speeds, going from 100 Gbit per second InfiniBand in Circe to 200G HDR InfiniBand in Selene. The result was a 4x increase in the effective node bandwidth.
Likewise, memory and storage links grew in capacity and throughput to handle jobs with hot, warm and cold storage needs. Four storage tiers spanned 100 terabyte/second memory links to 100 Gbyte/s storage pools.
Power and thermals stayed within air-cooled limits. The default designs used 35kW racks typical in leased data centers, but they can stretch beyond 50kW for the most aggressive supercomputer centers and down to 7kW racks some telcos use.
Seeking the Big, Balanced System
The net result is a more balanced design that can handle today’s many different workloads. That flexibility also gives researchers the freedom to explore new directions in AI and high performance computing.
“To some extent HPC and AI both require max performance, but you have to look carefully at how you deliver that performance in terms of power, storage and networking as well as raw processing,” said Julie Bernauer, who leads an advanced development team that’s worked on all of NVIDIA’s large-scale systems.
In the best of times, it can take dozens of engineers a few months to assemble, test and commission a supercomputer-class system. NVIDIA had to get Selene running in a few weeks to participate in industry benchmarks and fulfill obligations to customers like Argonne.
And engineers had to stay well within public-health guidelines of the pandemic.
“We had skeleton crews with strict protocols to keep staff healthy,” said Bernauer.
“To unbox and rack systems, we used two-person teams that didn’t mix with the others — they even took vacation at the same time. And we did cabling with six-foot distances between people. That really changes how you build systems,” she said.
Even with the COVID restrictions, engineers racked up to 60 systems in a day, the maximum their loading dock could handle. Virtual log-ins let administrators validate cabling remotely, testing the 20-node modules as they were deployed.
Bernauer’s team put several layers of automation in place. That cut the need for people at the co-location facility where Selene was built, a block from NVIDIA’s Silicon Valley headquarters.
Slacking with a Supercomputer
Selene talks to staff over a Slack channel as if it were a co-worker, reporting loose cables and isolating malfunctioning hardware so the system can keep running.
“We don’t want to wake up in the night because the cluster has a problem,” Bernauer said.
It’s part of the automation customers can access if they follow the guidance in the DGX POD and SuperPOD architectures.
Thanks to this approach, the University of Florida, for example, is expected to rack and power up a 140-node extension to its HiPerGator system, switching on the most powerful AI supercomputer in academia within as little as 10 days of receiving it.
As an added touch, the NVIDIA team bought a telepresence robot from Double Robotics so non-essential designers sheltering at home could maintain daily contact with Selene. Tongue-in-cheek, they dubbed it Trip given early concerns essential technicians on site might bump into it.
The fact that Trip is powered by an NVIDIA Jetson TX2 module was an added attraction for team members who imagined some day they might tinker with its programming.
Since late July, Trip’s been used regularly to let them virtually drive through Selene’s aisles, observing the system through the robot’s camera and microphone.
“Trip doesn’t replace a human operator, but if you are worried about something at 2 a.m., you can check it without driving to the data center,” she said.
Delivering HPC, AI Results at Scale
In the end, it’s all about the results, and they came fast.
In June, Selene hit No. 7 on the TOP500 list and No. 2 on the Green500 list of the most power-efficient systems. In July, it broke records in all eight systems tests for AI training performance in the latest MLPerf benchmarks.
“The big surprise for me was how smoothly everything came up given we were using new processors and boards, and I credit all the testing along the way,” said Houston. “To get this machine up and do a bunch of hard back-to-back benchmarks gave the team a huge lift,” he added.
The work pre-testing NGC containers and HPC software for Argonne was even more gratifying. The lab is already hammering on hard problems in protein docking and quantum chemistry to shine a light on the coronavirus.
At the same time, NVIDIA’s own researchers are using Selene to train autonomous vehicles and refine conversational AI, nearing advances they’re expected to report soon. They are among more than a thousand jobs run, often simultaneously, on the system so far.
Meanwhile the team already has on the whiteboard ideas for what’s next. “Give performance-obsessed engineers enough horsepower and cables and they will figure out amazing things,” said Bernauer.
At top: An artist’s rendering of a portion of Selene.
As its evocative name suggests, Abyss Solutions is a company taking AI to places where humans can’t — or shouldn’t — go.
The brainchild of four University of Sydney scientists and engineers, six years ago the startup set out to improve the maintenance and observation of industrial equipment.
It began by developing advanced technology to inspect the most difficult to reach assets of urban water infrastructure systems, such as dams, reservoirs, canals, bridges and ship hulls. Later, it zeroed in on an industry that often operates literally in the dark: offshore oil and gas platforms.
A few years ago, Abyss CEO Nasir Ahsan and CTO Suchet Bargoti were demonstrating to a Houston-based platform operator the insights they could generate from the image data collected by its underwater Lantern Eye 3D camera. The camera’s sub-millimeter accuracy provides a “way to inspect objects as if you’re taking them out of water,” said Bargoti.
An employee of the operator interrupted the meeting to describe an ongoing problem the company was having with their topside equipment that was decaying and couldn’t be repaired sufficiently. Once it was clear that Abyss could provide detailed insight into the problem and how to solve it, no more selling was needed.
“Every one of these companies is dreading the next Deepwater Horizon,” said Bargoti, referencing the 2010 incident in which BP spilled nearly 5 million barrels of oil into the Gulf of Mexico, killing 11 people and countless wildlife, and costing the company $65 billion in cleanup costs and fines. “What they wanted to know is, ‘Will your data analytics help us understand what to fix and when to fix it?’”
Today, Abyss’s combination of NVIDIA GPU-powered deep learning algorithms, unmanned vehicles and innovative underwater cameras is enabling platform operators to spot faults and anomalies such as corrosion on equipment above and below the water and address it before it fails, potentially saving millions of dollars and even a few human lives.
During the COVID-19 pandemic, the stakes have risen. Offshore rigs have emerged as hotbeds for the spread of the virus, forcing them to adopt strict quarantine procedures that limit the number of people onsite in order to reduce the disease’s spread and minimize interruptions.
Essentially, this has sped up the industry’s digital transformation push and fueled the urgency of Abyss’ work, said Bargoti. “They can’t afford to have these things happening,” he said.
Better Than Human Performance
Historically, inspection and maintenance of offshore platforms and equipment has been a costly, time-consuming and labor-intensive task for oil and gas companies. It often yields subjective findings that can result in missed needed repairs and unplanned shutdowns.
An independent audit found that Abyss’ semantic segmentation models are able to detect general corrosion with greater than 90 percent accuracy, while severe corrosion is identified with greater than 97 percent accuracy. Both are significant improvements over human efforts, and also have outcompeted other AI companies in the audit.
What’s more, Abyss says that its oil and gas platform clients report reductions in operating costs by as much as 25 percent thanks to its technology.
Training of Abyss’s models, which rely on many terabytes of data (each platform generates about 1TB a day), occurs on AWS instances running NVIDIA T4 Tensor Core GPUs. The company also uses the latest versions of CUDA and cuDNN in conjunction with TensorFlow to power deep learning applications such as image and video segmentation and classification, and object detection.
Most of the data can be processed in the cloud because of the slowness of the corrosion process, but there are times when real-time AI is needed onsite, such as when a robotic vehicle needs to make decisions on where to go next.
Taking Full Advantage of Inception
As a member of NVIDIA Inception, a program to help startups working in AI and data science get to market faster, Abyss has benefited from a try-before-you-buy approach to NVIDIA tech. That’s allowed it to experiment with technologies before making big investments.
It’s also getting valuable advice on what’s coming down the pipe and how to time its work with the release of new GPUs. Bargoti said NVIDIA’s regularly advancing technology is helping Abyss squeeze more data into each compute cycle, pushing it closer to its long-term vision.
“We want to be the intel in these unmanned systems that makes smart decisions and pushes the frontier of exploration,” said Bargoti. “It’s all leading to this better development of perception systems, better development of decision-making systems and better development of robotics systems.”
Abyss is taking a deep look at a number of additional markets it believes its technology can help. The team is taking on growth capital and rapidly expanding globally.
“Continuous investment in R&D and innovation plays a critical role in ensuring Abyss can provide game-changing solutions to the industry,” he said.
Testing for COVID-19 has become more widespread, but addressing the pandemic will require quickly screening for and triaging patients who are experiencing symptoms.
Lunit, a South Korean medical imaging startup — its name is a portmanteau of “learning unit” — has created an AI-based system to detect pneumonia, often present in COVID-19 infected patients, within seconds.
The Lunit INSIGHT CXR system, which is CE marked, uses AI to quickly detect 10 different radiological findings on chest X-rays, including pneumonia and potentially cancerous lung nodules.
It overlays the results onto the X-ray image along with a probability score for the finding. The system also monitors progression of a patient’s condition, automatically tracking changes within a series of chest X-ray images taken over time.
Lunit has recently partnered with GE Healthcare, which launched its Thoracic Care Suite using Lunit INSIGHT CXR’s AI algorithms to flag abnormalities on chest X-rays for radiologists’ review. It’s one of the first collaborations to bring AI from a medical startup to an existing X-ray equipment manufacturer, making AI-based solutions commercially available.
For integration of its algorithms with GE Healthcare and other partners’ products, Lunit’s hardware is powered by NVIDIA Quadro P1000 GPUs, and its AI model is optimized on the NVIDIA Jetson TX2i module. For cloud-based deployment, the company uses NVIDIA drivers and GPUs.
Lunit is a premier member of NVIDIA Inception, a program that helps startups with go-to-market support, expertise and technology. Brandon Suh, CEO of Lunit, said being an Inception partner “has helped position the company as a leader in state-of-the-art technology for social impact.”
AI Opens New Doors in Medicine
The beauty of AI, according to Suh, is its ability to process vast amounts of data and discover patterns — augmenting human ability, in terms of time and energy.
The founders of Lunit, he said, started with nothing but a “crazy obsession with technology” and a vision to use AI to “open a new door for medical practice with increased survival rates and more affordable costs.”
Initially, Lunit’s products were focused on detecting potentially cancerous nodules in a patient’s lungs or breasts, as well as analyzing pathology tissue slides. However, the COVID-19 outbreak provided an opportunity for the company to upgrade the algorithms being used to help alleviate the burdens of healthcare professionals on the frontlines of the pandemic.
“The definitive diagnosis for COVID-19 involves a polymerase chain reaction test to detect antigens, but the results take 1-2 days to be delivered,” said Suh. “In the meantime, the doctors are left without any clinical evidence that can help them make a decision on triaging the patients.”
With its newly refined algorithm, Lunit INSIGHT CXR can now single out pneumonia and identify it in a patient within seconds, helping doctors make immediate, actionable decisions for those in more urgent need of care.
The Lunit INSIGHT product line, which provides AI analysis for chest X-rays and mammograms, has been commercially deployed and tested in more than 130 sites in countries such as Brazil, France, Indonesia, Italy, Mexico, South Korea and Thailand.
“We feel fortunate to be able to play a part in the battle against COVID-19 with what we do best: developing medical AI solutions,” said Suh. “Though AI’s considered cutting-edge technology today, it could be a norm tomorrow, and we’d like everyone to benefit from a more accurate and efficient way of medical diagnosis and treatment.”
The team at Lunit is at work developing algorithms to use with 3D imaging, in addition to their current 2D ones. They’re also looking to create software that analyzes a tumor’s microenvironment to predict whether a patient would respond to immunotherapy.
Learn more about Lunit at NVIDIA’s healthcare AI startups solutions webinar on August 13. Register here.