Introduction
Shmoney AI Financial Chat Agent is an Artificial Intelligence Agent, or “AI Agent,” that has access to over 20 tools capable of retrieving near real-time financial market data, U.S. economic data, global forex data, Nasdaq ticker data and reports, cryptocurrency data, performing web searches and research, and is time-aware.

My name is Ignas Vaitukaitis. I am a backend software engineer with a focus on Gen AI and a hobby trader. I am currently working full-time on Shmoney AI to improve the performance and accuracy of the AI Agent we offer.
State of The Chat Agent
The Financial Chat Agent is currently in an early open beta, and this is the first of many tests I will be conducting going forward.
I am doing this in the hopes of being transparent about the agent’s accuracy, its issues, and the work that needs to be done. These tests will be published here on the Shmoney AI website, and after each test, I will return to developing and improving the agent.
FinanceBench Benchmark
Today, on January 29, 2025, I ran the 149 questions available from the FinanceBench benchmark on our Shmoney AI Financial Chat Agent.
This is not an official rating or test performed by the Patronus AI team. I ran the test myself using the 149 FinanceBench questions provided on their Hugging Face page. More information on the FinanceBench Benchmark can be found [here] and [here].
However, this test provides a good overview of what comes next in achieving mastery for the chat agent.
The Benchmark

Process
How I Did It and Grouping
I manually copied and pasted the questions from the FinanceBench questionnaire and compared the correct answers from their document to the answers provided by our Financial Chat Agent. Then, I saved our agent’s answers and assigned a color: green if the answer was correct, yellow if the answer was partially correct, or red if the answer was incorrect.
I decided to include the yellow category because some answers from the agent were 80–90 percent correct, but due to hallucinations, miscalculations, or misunderstandings of the question, they contained slight mistakes. While these are technically incorrect answers, it felt wrong to group them with responses that contained mostly false or inaccurate information.
The Questions
The questions consisted of open book financial questions, covering wide range of topics, the agent had to answer questions such as:
- What is Adobe’s year-over-year change in unadjusted operating income from FY2015 to FY2016 (in units of percents and round to one decimal place)? Give a solution to the question by using the income statement.
- Does AMCOR have an improving gross margin profile as of FY2023? If gross margin is not a useful metric for a company like this, then state that and explain why.
- What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement.
- What is Kraft Heinz’s FY2019 inventory turnover ratio? Inventory turnover ratio is defined as: (FY2019 COGS) / (average inventory between FY2018 and FY2019). Round your answer to two decimal places. Please base your judgments on the information provided primarily in the balance sheet and the P&L statement.
- Are JnJ’s FY2022 financials that of a high growth company?
- Does AMD have a reasonably healthy liquidity profile based on its quick ratio for FY22? If the quick ratio is not relevant to measure liquidity, please state that and explain why.
The Answers
The agent answered questions correctly when it was able to gather all the necessary data. However, it chose not to do extra research when the required data was not available in the provided sources (tools). This is not a limitation of the LLM but rather a result of how I have currently built the agent. The next step is to implement multi-step execution so that when the agent does not have the required data, it goes back to the beginning and attempts to retrieve it from other sources.
However, the answers that were correct were well-structured, providing formulas, calculations, explanations, and in-depth analysis of the topic. Even in this early version, the agent demonstrated impressive results.

The Results
The Correct
The agent correctly answered 75 out of 147 questions.
The correct answers were mostly those where the agent had gathered all the necessary data to answer the query. It already knew the formulas, performed the calculations, or conducted specific research.
The Partially Correct
The agent answered 32 out of 147 questions partially correctly.
These were cases where the agent provided a mostly correct answer but included misleading information in the response, preventing it from being classified as fully correct. If we group the correct and partially correct answers together, the agent answered 107 out of 147 questions somewhat correctly.
The Wrong
The agent answered 40 out of 147 questions incorrectly.
These answers completely missed the mark, and I will prioritize working on them. The issues are related to prompting, data sources, multi-step research, query confirmation with the user, and several other factors I have observed over the past few days. The goal is to bring this number down to ZERO.
File of The Test
Conclusion and What Comes Next
The next steps for Shmoney AI Financial Chat Agent are clear, with the main goal being to reduce incorrect answers to ZERO, making this a reliable tool for finance professionals to gather research and perform analysis.
This was the first benchmark performed on the agent, and many more will follow, such as financial job interview questions and other measures to evaluate the agent’s current level. This helps me identify the agent’s weak points.
If you are a trader, financial professional, or someone excited about AI in finance, you can use Shmoney AI for free by contacting me and requesting access.
Don’t forget to follow Shmoney AI on Reddit and X, where I will be sharing updates on progress and upcoming developments.