Today's cybersecurity headlines are brought to you by ThreatPerspective


Ethical Hacking News

Exposing AI Inference Frameworks: The Rise of ZeroMQ Vulnerabilities



Critical Remote Code Execution Vulnerabilities Discovered in AI Inference Frameworks
AI inference frameworks from Meta, Nvidia, Microsoft, and open-source PyTorch projects have been found to be vulnerable to critical remote code execution. These vulnerabilities can lead to catastrophic attacks, model theft, and data breaches if left unaddressed.

  • Cybersecurity researchers have discovered critical remote code execution vulnerabilities in popular AI inference frameworks from Meta, Nvidia, Microsoft, and open-source PyTorch projects.
  • The root cause of the issue lies in the unsafe use of ZeroMQ and Python's pickle deserialization.
  • More than a dozen affected frameworks include Meta's Llama large language model (LLM) framework, NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, and SGLang.
  • The discovered vulnerabilities have significant potential impact, including arbitrary code execution, privilege escalation, model theft, and malicious payload deployment.



  • Cybersecurity researchers have uncovered critical remote code execution vulnerabilities impacting major artificial intelligence (AI) inference engines, including those from Meta, Nvidia, and Microsoft. These discoveries highlight the importance of addressing zeroMQ vulnerabilities in AI frameworks to prevent catastrophic attacks.

    The root cause of these vulnerabilities lies in the unsafe use of ZeroMQ (ZMQ) and Python's pickle deserialization. Oligo Security researcher Avi Lumelsky attributed the issue to a pattern called ShadowMQ, which has propagated across multiple projects due to code reuse. The problem stems from the insecure deserialization logic being overlooked, leading to a series of devastating consequences.

    One of the affected frameworks is Meta's Llama large language model (LLM) framework, which was patched by the company last October. However, other inference engines such as NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, and SGLang have also been found to be vulnerable to similar issues.

    "All contained nearly identical unsafe patterns: pickle deserialization over unauthenticated ZMQ TCP sockets," Lumelsky said. "Different maintainers and projects maintained by different companies – all made the same mistake." The researcher noted that some cases were caused by a direct copy-paste of code, while others were the result of borrowing from the same flawed source.

    The discovered vulnerabilities have been assigned the following identifiers: CVE-2025-30165 (CVSS score: 8.0) for vLLM; CVE-2025-23254 (CVSS score: 8.8) for NVIDIA TensorRT-LLM; CVE-2025-60455 (CVSS score: N/A) for Modular Max Server; and Sarathi-Serve remains unpatched.

    The potential impact of these vulnerabilities cannot be overstated, as inference engines form a crucial component within AI infrastructures. A successful compromise of a single node could permit an attacker to execute arbitrary code on the cluster, escalate privileges, conduct model theft, and even drop malicious payloads like cryptocurrency miners for financial gain.

    "The projects are moving at incredible speed, and it's common to borrow architectural components from peers," Lumelsky noted. "But when code reuse includes unsafe patterns, the consequences ripple outward fast." The researcher emphasized the importance of adopting safe coding practices and thoroughly testing AI frameworks before deployment.

    The discovery comes as a reminder that even seemingly secure AI frameworks can be vulnerable to exploitation. As AI continues to evolve at an unprecedented rate, it is essential to prioritize security and address these emerging vulnerabilities proactively.

    Summary:
    Cybersecurity researchers have discovered critical remote code execution vulnerabilities in popular AI inference frameworks from Meta, Nvidia, Microsoft, and open-source PyTorch projects such as vLLM and SGLang. The root cause of the issue lies in the unsafe use of ZeroMQ and Python's pickle deserialization, leading to a series of devastating consequences if left unaddressed.



    Related Information:
  • https://www.ethicalhackingnews.com/articles/Exposing-AI-Inference-Frameworks-The-Rise-of-ZeroMQ-Vulnerabilities-ehn.shtml

  • https://thehackernews.com/2025/11/researchers-find-serious-ai-bugs.html

  • https://nvd.nist.gov/vuln/detail/CVE-2025-30165

  • https://www.cvedetails.com/cve/CVE-2025-30165/

  • https://nvd.nist.gov/vuln/detail/CVE-2025-23254

  • https://www.cvedetails.com/cve/CVE-2025-23254/

  • https://nvd.nist.gov/vuln/detail/CVE-2025-60455

  • https://www.cvedetails.com/cve/CVE-2025-60455/


  • Published: Fri Nov 14 09:42:05 2025 by llama3.2 3B Q4_K_M













    © Ethical Hacking News . All rights reserved.

    Privacy | Terms of Use | Contact Us