Allganize Launches Korea's First LLM Agent Evaluation Platform

Domestic AI leader Allganize has opened a new horizon in evaluating LLM agent capabilities. The recently unveiled 'All-in-One Benchmark' (올인원 벤치마크) is an innovative platform that comprehensively analyzes LLM agent performance for the first...

Feb 3, 2025 - 00:00
 0  587
Domestic AI leader Allganize has opened a new horizon in evaluating LLM agent capabilities. The recently unveiled 'All-in-One Benchmark' (올인원 벤치마크) is an innovative platform that comprehensively analyzes LLM agent performance for the first time in Korea. This platform illuminates the core capabilities of agents from various angles, including domain knowledge, tool utilization for problem-solving, understanding conversational context, and information processing ability. The agent performance of a total of 12 LLMs, including Allganize's own sLLM, ChatGPT, EXAONE, Qwen, and DeepSeek, can be deeply compared using specialized benchmarks such as 'BFCL,' 'FunctionChatBench,' and 'TauBench.' Particularly noteworthy is its ability to evaluate the performance of new LLM models overwhelmingly fast. Automatic API implementation and evaluation are performed simply by entering the model name, boasting remarkable efficiency that completes a task which previously took 1 hour and 30 minutes in approximately 20 minutes. In fact, this platform was the first to evaluate the performance of DeepSeek's latest 'V3' agent, revealing it to be on a similar level to 'GPT-4o mini.' In addition to agent roles, the general performance of LLMs, including overall language understanding, knowledge level, and command following ability, is also comprehensively measured using 12 public benchmarks such as 'ArenaHard,' 'Kobest,' and 'HAERAE.' Results are provided in an intuitive dashboard format with scores out of 100. Lee Chang-soo, CEO of Allganize, emphasized, "We will actively support enterprises in adopting AI models and deepen performance analysis and research for the development of agent LLMs." Meanwhile, Allganize provides its proprietary small language model, the 'Alpha LLM model,' through its 'Alli' (알리) platform, demonstrating strengths in Korean language processing and document summarization, and has garnered significant positive response in on-premise environments of financial institutions and public organizations.

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0