Production Engineering on AI VOID

Signal Impacted by Twilio Social Engineering Attack

Tue, 26 May 2026 00:00:00 +0000

Incident: Signal Impacted by Twilio Social Engineering Attack Date: 2022-08-08 | Duration: ~None hours | Severity: P1-high Affected: A small number of Signal users | Systems: Twilio’s phone number verification services, Signal user registration/verification process Root cause (summary): Twilio employees fell victim to a sophisticated phishing attack, leading to the compromise of their credentials and unauthorized access to Twilio’s internal support systems.

Timeline: Timeline not available from public sources.

Incident Summary

On August 8, 2022, Signal was notified by its third-party phone number verification provider, Twilio, about a security incident. Twilio had experienced a sophisticated social engineering attack, where malicious actors successfully phished several of its employees. This compromise granted the attackers unauthorized access to Twilio’s internal support systems, which included access to customer data for a limited number of Twilio clients.

LLM Guardrail Failure in Production: The Discrepancy Between Test and Reality

Mon, 25 May 2026 00:00:00 +0000

Incident: LLM Guardrail Failure in Production: The Discrepancy Between Test and Reality Date: unknown | Duration: ~6.0 hours | Severity: P1-high Affected: unknown, potentially thousands over time | Systems: LLM Inference Service, Guardrail Enforcement Layer, User-Facing Application Root cause (summary): LLM guardrails, which performed adequately in pre-production testing, failed to prevent undesirable outputs when exposed to the full spectrum of real-world user inputs and sustained production load.

Incident Summary

On an unknown date, our AI-Powered Service Provider experienced a critical incident where the Large Language Model (LLM) guardrails, designed to filter and prevent undesirable outputs, failed in our production environment. This failure led to the generation and delivery of inappropriate or harmful content to users through our primary user-facing application. The incident persisted for approximately 6 hours, marking a P1-high severity event due to the direct impact on user experience and brand reputation.

OpenAI macOS App Supply Chain Attack via TanStack

Sat, 23 May 2026 00:00:00 +0000

Incident: OpenAI macOS App Supply Chain Attack via TanStack Date: 2026-05-21 | Duration: ~unknown hours | Severity: P0-critical Affected: All macOS app users (potential for future compromise) | Systems: OpenAI macOS app, OpenAI iOS app (potential), OpenAI Windows app (potential) Root cause (summary): The compromise of two OpenAI employee devices via a malicious TanStack npm package, which was part of the broader Shai-Hulud supply chain attack, led to the exfiltration of private code signing certificates for macOS, iOS, and Windows.

RubyGems Malicious Package Upload Security Incident

Fri, 22 May 2026 00:00:00 +0000

Incident: RubyGems Malicious Package Upload Security Incident Date: 2025-09-10 | Duration: ~192 hours | Severity: P0-critical Affected: RubyGems users (specific number unknown) | Systems: RubyGems.org package registry, RubyGems.org user signup system Root cause (summary): The incident was caused by the improper use or compromise of administrative credentials, allowing unauthorized uploads of hundreds of malicious packages to the RubyGems.org registry.

Timeline of Events

Time (UTC)	Event
September 10-18, 2025	Period during which hundreds of malicious packages were uploaded to RubyGems.org, leading to the suspension of new signups.
September 18, 2025	Ruby Central communicates termination to a former RubyGems.org operator, as part of the incident response.
October 10, 2025	Ruby Central releases a comprehensive security incident report addressing the events.

Incident Summary

On September 10, 2025, RubyGems.org, the primary package registry for the Ruby programming language, experienced a severe security breach involving the unauthorized upload of hundreds of malicious packages. This critical incident, which spanned approximately eight days, severely impacted the integrity of the RubyGems ecosystem and necessitated the suspension of new user signups to contain the threat.

DENIC .de TLD DNSSEC Outage

Thu, 21 May 2026 00:00:00 +0000

Incident: DENIC .de TLD DNSSEC Outage Date: 2026-05-05 | Duration: ~None hours | Severity: P0-critical Affected: Millions of domains unreachable | Systems: .de TLD DNSSEC validation, DNS resolvers globally Root cause (summary): DENIC, the registry operator for the .de TLD, published incorrect DNSSEC signatures for the .de zone.

Incident Summary

On May 5, 2026, the internet experienced a significant disruption affecting millions of domains under the .de country-code top-level domain (ccTLD). This outage was triggered when DENIC, the authoritative registry operator for the .de TLD, began publishing incorrect DNSSEC signatures for its zone.

Mini Shai-Hulud Supply Chain Attack on TanStack npm Packages

Tue, 19 May 2026 00:00:00 +0000

Incident: Mini Shai-Hulud Supply Chain Attack on TanStack npm Packages Date: 2026-05-17 | Duration: ~2.0 hours | Severity: P0-critical Affected: unknown (potentially millions of downstream users) | Systems: TanStack npm packages, npm registry, developer build systems Root cause (summary): Malicious versions of TanStack npm packages were published to the npm registry, containing the self-propagating ‘Mini Shai-Hulud’ worm, indicating a compromise of TanStack’s publishing credentials or build process.

Timeline (if available):

Node-IPC Supply Chain Attack: Protestware Incident

Tue, 19 May 2026 00:00:00 +0000

Incident: Node-IPC Supply Chain Attack: Protestware Incident Date: 2022-03-08 | Duration: Malicious versions available: Early March 2022 - March 2022 (mitigated) | Severity: P0-critical Affected: unknown, potentially widespread across the JavaScript ecosystem | Systems: Node.js applications using node-ipc, Any system with a dependency on node-ipc (direct or transitive) Root cause (summary): The maintainer of the ’node-ipc’ package published malicious versions (e.g., 9.2.x, 10.1.x) to npm, containing ‘protestware’ designed to wipe files on systems located in specific geographic regions.

QUIC Congestion Window Stalling Due to Linux Kernel Idle Optimization Misport: Engineering Postmortem

Sun, 17 May 2026 00:00:00 +0000

Incident: QUIC Congestion Window Stalling Due to Linux Kernel Idle Optimization Misport Date: 2023-08-15 (Discovered) | Duration: Latent for years, ~6 hours (diagnosis & fix deployment) | Severity: P1-high Affected: All Cloudflare QUIC connections utilizing the quiche library, impacting global user experience, especially after packet loss. Systems: Cloudflare quiche QUIC implementation, Linux kernel CUBIC porting layer, QUIC-enabled services. Root cause (summary): Incorrect calculation of “idle” periods in quiche’s CUBIC congestion control port, preventing congestion window recovery after packet loss by perpetually resetting the idle timer.