Panel

Noticias Financieras

Google AdSenseNews Headernews_header
Deutsche Bank and DWS explore investment in life insurer Frankfurter Leben
biztochace 13d

Deutsche Bank and DWS explore investment in life insurer Frankfurter Leben

Deutsche Bank and its asset management arm DWS Group are weighing the possibility of taking a “significant” minority interest in insurance company Frankfurter Leben Gruppe, reported Bloomberg.The insurance consolidator is majority-owned by Fosun International.Such a move would mark Deutsche...

#TECH
iPhone 16 tops global smartphone sales in 2025; Apple and Samsung control top 10
news9livehace 13d

iPhone 16 tops global smartphone sales in 2025; Apple and Samsung control top 10

Apple’s iPhone 16 emerged as the world’s best-selling smartphone in 2025, with Apple and Samsung dominating the global Top 10 models. Apple secured seven spots, driven by strong demand for the iPhone 17 series, while Samsung’s Galaxy A16 5G led Android sales and the S25 Ultra gained traction in key markets.

#TECH
AI agent evaluations: The hidden cost of deployment
cio_jphace 13d

AI agent evaluations: The hidden cost of deployment

Organizations deploying AI agents may be in for a nasty surprise when it comes to the cost of tuning their performance.According to some surveys, nearly 80% of enterprises have deployed AI agents, but most don’t understand the cost of training them and evaluating their outputs, which can result in costs far exceeding expectations, experts say.Many organizations are still experimenting to find the best ways to catch agent problems before they cause chaos after deployment, says Lior Gavish, cofounder and CTO at AI observability vendor Monte Carlo.Because many organizations use a second large language model to vet the outputs of an LLM-powered agent, agent testing can be many times more expensive than testing traditional software, he says. Moreover, this method, called LLM as a judge, can be more expensive than running the agent itself, as the cost of running an LLM over an extended period can add up quickly.“It’s tricky to test or monitor these outputs,” Gavish says. “People basically ask another LLM to rate the performance of an LLM based on various criteria, and the criteria vary wildly between different use cases.”Monte Carlo saw this problem itself when the company left an LLM-powered eval running for days and ended up with a five-figure bill, Gavish notes. “An LLM call usually is orders of magnitude more expensive than anything that we would do in traditional software,” he says.LLMs rating LLMsUsing a second LLM to review the outputs of an agent can also be problematic because it assumes the second LLM’s conclusions are accurate, Gavish says. Questions about accuracy can add to costs if organizations keep running tests to verify results.“These checks are non-deterministic and not even repeatable,” he says. “You might get different answers and different runs if you’re not careful, so it’s different from more traditional software monitoring or testing where it either passed or it failed.”The cost of agent evals can vary wildly depending on the complexity of the agent, says Russell Twilligear, head of AI R&D at AI-generated content provider BlogBuster. For example, an evaluation for a small, well-scoped agent can run into the thousands of dollars, while evals for more complex agents can cost tens of thousands of dollars, he says.“You have to factor in all of the test runs, logging, and human reviews,” Twilligear notes. “Every single change means they have to rerun the evals, and that adds up pretty fast.”Agent evals can be complicated because they test for several possible metrics, including agent reasoning, execution, data leakage, response tone, privacy, and even moral alignment, according to AI experts.Good evals incorporate a human element, with subject-matter experts needed to check agent outputs, says Paul Ferguson, founder of Clearlead AI Consulting. A major challenge in agent evals is establishing what “correct” means in ambiguous use cases, he adds.Most IT leaders budget for obvious costs — including compute time, API calls, and engineering hours — but miss the cost of human judgment in defining what Ferguson calls the “ground truth.”“When evaluating whether an agent properly handled a customer query or drafted an appropriate response, you need domain experts to manually grade outputs and achieve consensus on what ‘correct’ looks like,” he adds. “This human calibration layer is expensive and often overlooked.”Software evals can be straightforward when organizations are checking for code to compile and pass all unit tests, he says. “But for the vague queries like, ‘Help me understand this data,’ or ‘Draft a response to this customer,’ defining what constitutes a correct answer becomes genuinely difficult,” he adds. “Even humans can disagree in some cases.”Agent evaluation adviceThe sticker shock of agent evals rarely comes from the compute costs of the agent itself, but from the “non-deterministic multiplier” of testing, adds Chengyu “Cay” Zhang, founding software engineer at voice AI vendor Redcar.ai. He compares training agents to training new employees, with both having moods.“You can’t just test a prompt once; you have to test it 50 times across different scenarios to see if the agent holds up or if it hallucinates,” he says. “Every time you tweak a prompt or swap a model, you aren’t just running one test; you’re rerunning thousands of simulations.”There are several ways to run agent evals, including low-cost unit testing, synthetic grading using another AI model, red-team simulations, and high-cost human shadowing, in which a human expert runs alongside an agent for a week or more, Zhang says.Organizations often look for shortcuts, usually by relying entirely on other AI models to do the grading, he says, recommending against that route.“My view is that evaluations are an insurance policy,” he says. “Shortcuts in evals are just deferred technical debt that you pay with interest when the agent hallucinates in front of a VIP client. You might save $10,000 on evals today, but if your financial agent hallucinates a transaction, that cost is negligible compared to the brand damage.”If an organization wants to save money, the better alternative is to narrow the agent’s scope, instead of cutting back on testing, Zhang adds.“If you skip the expensive steps — like human review or red-teaming — you’re relying entirely on probability,” he says.To limit eval costs, Clearlead AI Consulting’s Ferguson recommends organizations start with use cases that have clear right and wrong answers, like code compilation, before tackling more subjective scenarios, he says.Organizations should also use LLM evaluation frameworks such as LangSmith, PromptLayer, or Ragas rather than building their own tools from scratch, he advises.IT teams should also start testing early, he adds. “Building evaluations before production is far cheaper than retrofitting them later,” Ferguson says.Monte Carlo’s Gavish offers other ways to keep costs down, such as setting spending limits for evals and performing due diligence on which LLMs they use to test agents.“You can rightsize the model a little bit,” he says. “Of course, you can use the latest and greatest ChatGPT for every evaluation, but you probably shouldn’t.”

#TECH
Experts warn Trump's AI images erode public trust
democratheraldhace 13d

Experts warn Trump's AI images erode public trust

LOS ANGELES — The Trump administration has not shied away from sharing AI-generated imagery online, embracing cartoonlike visuals and memes and promoting them on official White House channels.

#TECH
Tanzania eyes East Africa’s pharmaceutical hub crown {Business Africa}
biztochace 13d

Tanzania eyes East Africa’s pharmaceutical hub crown {Business Africa}

Tanzania has launched an ambitious program to develop its local pharmaceutical industry, with the goal of reducing dependence on imported medicines and positioning itself as a manufacturing hub for East Africa.At present, more than 80 percent of medicines and medical equipment used in the country...

#TECH
The Future of Design: Why an AI Image Generator Is a Game-Changer
techbullionhace 13d

The Future of Design: Why an AI Image Generator Is a Game-Changer

The world of design is undergoing a massive transformation. From digital art to marketing visuals and product mockups, businesses and creators are increasingly relying on technology to streamline workflows, enhance creativity, and deliver high-quality results faster. Among the most transformative innovations in recent years is the AI image generator, a tool that leverages artificial intelligence [...]The post The Future of Design: Why an AI Image Generator Is a Game-Changer appeared first on TechBullion.

#TECH
Jiangsu vs Guangdong: why the battle to be China’s No 1 economy is heating up
scmphace 13d

Jiangsu vs Guangdong: why the battle to be China’s No 1 economy is heating up

The southern Guangdong province has been the largest engine powering China’s economic rise for decades. But the region is now in danger of losing its status as the country’s top regional economy, as a rival to the east outpaces its growth.Jiangsu, home to a wide range of multinationals and hi-tech enterprises, has long been Guangdong’s closest competitor: together, the two provinces account for over 20 per cent of China’s gross domestic product (GDP).And the region has shown greater dynamism...

#ECONOMY
Google AdSenseNews Footernews_footer