Nvidia is known for its GPUs, computer chips that are integral to processing intensive AI workloads. But it also has a thriving business in industry-specific AI applications and frameworks, enabling companies to harness AI capabilities across various sectors.
The AI Innovator recently sat down with Malcolm deMayo, Nvidia’s vice president of financial services, to discuss what the chipmaker is doing for Wall Street firms.
What follows is an edited transcript of that conversation.
The AI Innovator: How are you leveraging AI to transform financial services?
Malcolm deMayo: Traditional AI has been in financial services for many, many years, and we’ve been working with large financial firms across a number of different types of workloads, primarily high performance compute in trading, in underwriting, and use cases like that.
But what’s happened with the advent of ChatGPT is that generative AI is now available to virtually every practitioner in financial services. … Now everyone is a programmer because their programming language is the English language. So the opportunity to rethink or reimagine banking is enormous, and generative AI is going to be a tsunami-sized wave of change.
The previous technologies, such as mobile internet and cloud, took years before … they were adopted by financial firms. It was happening at a glacier pace. Adoption of generative AI is happening at the speed of light. Generative AI requires a new kind of compute, and so Nvidia is involved. It really starts with the fact that our founders’ observation years ago was that Moore’s law is dead, that CPU performance scaling has ended.
As you know, we have an energy crisis in the world, where there are energy constraints everywhere, power constraints everywhere. We also have a compute crisis. The cost of compute to run large scale applications just continues to climb. The passion of Nvidia over the last 30 years has been to build a full stack solution that we call accelerated compute. It is enabling companies in finance to run AI cost efficiently and energy efficiently. … You just really can’t do this efficiently without us.
Let’s pivot to how are financial firms thinking about using AI. The first use case that I want to talk about is document analysis. Generative AI enables financial firms to analyze text, image, video – multimodal (content). They’ve never been able to analyze this mountain of data without the use of AI and accelerated compute. Traditional AI really focused on structured data. An example of structured data is spreadsheet data.
Now they have the opportunity to really tap into all of the information in hundreds of repositories of text data or PDF data stored throughout these firms. We’ve announced our PDF agent Blueprint from NIM (Nvidia Inference Microservices). (Nvidia Blueprints are reference workflows that include pre-trained AI models, microservices, reference code, documentation and deployment scripts clients can adopt and deploy for real-world use cases.)
Trillions of PDFs are created a year in financial services, and a PDF is usually multipage; it has graphs, pictures, it’s really hard to extract information … you really needed humans to do that. Our PDF extraction agent panel is enabling financial firms to simply extract multimodal information using a large language model.
Then couple that with our accelerated compute that is generating four petaflops of AI performance. It’s equivalent to four research assistants each reading, extracting and summarizing 1,000 books per second. That’s power. So now you have the ability, in real time, to extract information from these documents, summarize them, and create insights, create intelligence.
Which LLM are you using for the PDF extraction agent?
Our view is that all models are great. We love them all, and so we’ve built for them inside our accelerated compute platform. … Basically, when Meta drops a new Llama model or Mistral drops a new Mistral mixture (of experts) model, or anybody builds a new pre-trained open source model, we take that model and we put it in a NIM.
We’ve taken the pre-trained, open source models and optimized them to run on our platform. We’ve encompassed it within a Kubernetes container, which makes it highly portable, and we include all the inference engines in the container with all the run time, so that you can deploy the first model in minutes. … We’ve turned off the parameters the model’s not using to make it more affordable. You can choose whatever model you want.
The beauty of what Meta and Mistral and others have done is to allow enterprises to use it for free. We take that, we encapsulated it in our NIM, and we make these Blueprints available. So now, financial services firms can customize our PDF Blueprint if they want to, or just use it the way it is, to extract information.
The second use case in financial services that’s very hot right now is building virtual assistants. You can go on Nvidia.com and search for James, the customer avatar. He’s pre-trained to answer questions on Nvidia, but it’s based on the Blueprints I’m describing. So now you have the power of a core platform creating these human-like avatars, and it’s in a Blueprint format, so customers can adopt this quickly. Instead of just creating a Q&A bot, they can create a digital avatar so that the experience is more human-like, and we found that customers are more likely to have a better experience.
How do you deal with hallucinations, bias and explainability? These issues are especially critical in the heavily regulated financial services industry.
AI is not new to financial services. The responsibility of delivering responsible AI is on the financial services brand; our responsibility is to provide them with the right tools to be able to do this. It also depends on the uses. If you’re doing a credit score, the ability to explain why you’re extending credit to one customer and not another is very important. The first step is to stay away (initially) from tasks that are going to be heavily regulated. Start building the capability to prove out (the AI model) first.
From an NVIDIA perspective, we make sure that when we build an AI model, we should absolutely create a model card that shows what data we used to train it, and prove that we have the rights to that data, so there is full transparency. In addition, we built guardrails into that microservices composable framework that enable the model to stay on topic. …
In addition to topical safety, our guardrails allow us to connect to backend databases or data systems that are secure and approved by the financial institution, and we’re also fact-checking the answers to make sure that no hallucinations get back to an end user. To fact-check, we built a retriever (Retrieval Augmented Generation technique) that taps into knowledge bases of the information inside your firm. When an employee or a customer asks a question, that question is sourced to the knowledge bases.
The third use case is fraud (detection). Credit card fraud is a growth industry. Since COVID, digital transactions and e-commerce transactions have exploded. Card issuing banks today are using rules-based software to monitor transactions. These are brittle. Humans have to code them.
We’re using an AI machine learning workflow, and we’re embedding graph maps. We’re embedding transformer features so that you still have the lightning fast execution of machine learning – because you only have a second to 1.5 seconds to approve and authorize a credit card transaction – but we want to improve the intelligence of the model.
Going from a rules-based environment to this model will reduce false positives in the high double-digit percentages, which will help the banks. … We’re going to build a NIM agent Blueprint of this as well. … It’s optimized to run on our platform. So you can see a 5x performance improvement in inference, which is usually about an 80% reduction in cost.
Your CEO saw the demand for AI coming years ago. What best practices do you have to spark innovation at the company?
First, we engage directly with customers. Nobody knows where the friction is or where the problems are better than the people trying to use it every day. For example, when Bloomberg was creating Bloomberg GPT, they were having trouble getting the model … to do the work they wanted it to do.
They called us. We engaged with them and helped them. They were using some of our code and if a company is using our platform, we will engage and help them. We took the training time from 50-plus days down to 14-ish days, but in the process we also we learned that we needed an inference engine. This was two years ago, and so we built an inference engine called TensoRT-LLM, part of our Triton libraries, which is part of Nvidia AI enterprise.
That’s one way we innovate. Another way is we work with every AI builder we can possibly work with. Right now, it’s probably 1,600 companies and growing. We engage with them, we understand their ideas, and we help them and teach them how to use the platform to embed generative AI and use generative AI efficiently in the work they’re doing. We learn a lot in the course of these (collaborations). Then, of course, we have some of the greatest minds on the planet doing research in our R&D labs, and we’re going to spend $8.5 billion dollars this year in R&D alone.