SOC2

SOC 2 Incident Response: Requirements and Playbook

SOC 2 requires a tested incident response capability. This guide covers the requirements, how to build a playbook, what evidence auditors need, and common incident response mistakes.

GT

GRCTrail Team

SOC 2 Incident Response Guide

Security incidents are not hypothetical for SaaS companies. They are an operational reality. A misconfigured S3 bucket exposes customer data. A compromised dependency introduces malicious code into your build pipeline. A credential stuffing attack overwhelms your authentication service. When these events occur, SOC 2 requires that you detect them, respond systematically, and learn from them β€” and that you can prove you did all three.

Multiple Common Criteria address incident management across the entire lifecycle: monitoring for anomalies, evaluating whether events constitute incidents, executing response procedures, and communicating with affected parties. Auditors will not simply check that you have an incident response plan on file. They will test whether your team knows their roles, whether your detection systems are operational, whether you have actually executed the plan (or tested it), and whether post-incident reviews drove improvements.

This guide covers the specific SOC 2 requirements for incident response, how to build a playbook that works under pressure, what evidence auditors examine, and the mistakes that generate findings.

SOC 2 Incident Response Requirements

Incident response requirements span several Common Criteria within the mandatory Security criterion. Understanding exactly which criteria map to which capabilities ensures your program has no gaps.

CC7.2: Monitors System Components β€” Your organization must monitor infrastructure and application components to detect anomalies that indicate security events. This means you need functioning monitoring and alerting systems, not just documentation saying you intend to monitor.

CC7.3: Evaluates Security Events β€” When monitoring detects something abnormal, you need a defined process to triage the event and determine whether it constitutes a security incident requiring a response. Not every alert is an incident, but every alert must be evaluated.

CC7.4: Responds to Security Incidents β€” When an event is classified as an incident, you must execute documented response procedures. This covers containment, eradication, recovery, and the coordination activities that happen during an active incident.

CC7.5: Communicates Security Incidents β€” You must notify affected parties β€” customers, regulators, partners β€” based on the nature and severity of the incident. The criteria require that you have defined communication procedures, not that you figure it out in the moment.

CC2.3: Communicates Externally β€” This broader criterion addresses external communication processes, including how you notify customers and regulators about incidents that affect them.

For a complete mapping of all Common Criteria, see our SOC 2 Common Criteria guide.

Overlap with GDPR breach notification: If your SaaS processes personal data of EU residents, a security incident involving personal data triggers GDPR breach notification obligations β€” 72 hours to the supervisory authority and without undue delay to affected individuals. Your incident response plan must account for these parallel timelines. See our GDPR data breach notification guide for the specific requirements.

Building Your Incident Response Plan

An incident response plan that works under pressure is structured around five phases. Each phase produces specific outputs that serve both operational needs during the incident and evidence needs during your audit.

Phase 1: Preparation

Preparation is everything you do before an incident occurs. This phase determines whether your team executes a practiced playbook or improvises under stress.

Define incident severity levels. Clear severity classification prevents every alert from being treated with the same urgency β€” which in practice means nothing gets the urgency it deserves.

SeverityCriteriaResponse TimeExamples
P1 β€” CriticalActive data breach, complete service outage, or active exploitation of production systemsImmediate (within 15 minutes)Customer data exfiltration, ransomware encrypting production, full platform unavailable
P2 β€” HighPartial service degradation, confirmed vulnerability being actively targeted, or unauthorized access detectedWithin 1 hourAuthentication service degraded, brute force attack succeeding against accounts, unauthorized admin access
P3 β€” ModerateSecurity event requiring investigation, minor service impact, or policy violationWithin 4 hoursSuspicious login patterns, non-critical service failure, employee accessing unauthorized resources
P4 β€” LowInformational security events, near-misses, or minor policy exceptionsWithin 24 hoursFailed phishing attempt (no credentials compromised), vulnerability scan finding, security tool misconfiguration

Establish the incident response team. Define roles before the incident, not during it.

  • Incident Commander β€” Owns the response. Makes decisions on containment strategy, resource allocation, and communication timing. For SaaS companies, this is typically a senior engineering manager or VP of Engineering.
  • Technical Lead β€” Directs the technical investigation and remediation. Coordinates engineering resources. Usually the most senior on-call engineer or security engineer.
  • Communications Lead β€” Manages internal and external communications. Drafts customer notifications, coordinates with legal on regulatory notifications, and manages status page updates. Often someone from customer success or marketing with security awareness.
  • Scribe β€” Documents everything in real time: decisions made, actions taken, timeline of events. This role is critical for the post-incident review and for audit evidence. Often overlooked, always regretted when missing.

Set up communication channels. Define these in advance and ensure everyone knows where to go:

  • Dedicated incident Slack channel (or equivalent) created immediately when an incident is declared, with a naming convention like #incident-2026-03-042
  • PagerDuty (or equivalent) escalation policies that automatically notify the right people based on severity
  • War room procedures for P1/P2 incidents β€” video call link, bridge line, or physical room
  • Out-of-band communication for scenarios where primary systems are compromised (personal phones, alternative messaging platform)

Prepare notification templates. Drafting customer notifications during an active incident leads to delayed, unclear, or legally problematic communications. Prepare templates for:

  • Initial customer notification (we are aware, we are investigating)
  • Status updates (what we know, what we are doing, expected next update)
  • Resolution notification (what happened, what we did, what we are changing)
  • Regulatory notification (if personal data is involved β€” see GDPR breach notification)

Document escalation paths. Who can declare a P1? Who approves customer notifications? Who authorizes taking a service offline? Document these decision authorities so they are not debated during an incident. Link your incident response procedures to your broader policies and procedures framework.

Phase 2: Detection and Analysis

Detection is where your monitoring investment pays off β€” or where gaps become painfully visible.

Monitoring and alerting systems. SOC 2 (CC7.2) requires that you monitor system components to detect anomalies. For SaaS companies, this typically means:

  • SIEM or log aggregation β€” Centralized logging from all production systems, with correlation rules that identify suspicious patterns (multiple failed logins, unusual data access volumes, privilege escalation attempts)
  • Application performance monitoring (APM) β€” Detects anomalies in application behavior that may indicate compromise (unexpected latency spikes, unusual API call patterns, unexplained error rate increases)
  • Infrastructure monitoring β€” CPU, memory, disk, and network monitoring that catches resource-based attacks (cryptomining, DDoS) and operational issues that could indicate compromise
  • Endpoint detection and response (EDR) β€” Monitors employee devices for malware, unauthorized software, and suspicious behavior
  • Cloud security monitoring β€” AWS CloudTrail, GCP Audit Logs, Azure Activity Logs β€” with alerts on security-relevant events (IAM changes, security group modifications, data access outside normal patterns)

Triage process. When an alert fires, your team needs a structured approach to determine what they are dealing with:

  1. Verify the alert β€” Is this a true positive or a false positive? Check corroborating data sources.
  2. Classify the event β€” Does this meet the criteria for a security incident? Use your severity definitions.
  3. Assign severity β€” Based on your P1-P4 criteria, classify the severity to determine response urgency and team mobilization.
  4. Activate response β€” If classified as an incident, activate the response team according to the severity level.

Initial assessment checklist. Within the first 30 minutes of declaring an incident, the technical lead should be able to answer:

  • What systems are affected?
  • Is customer data potentially exposed?
  • Is the attack/failure still active?
  • What is the blast radius (one customer, all customers, one region, all regions)?
  • Are we legally obligated to notify anyone (GDPR, contractual obligations)?

Evidence preservation. From the moment an incident is suspected, preserve forensic evidence. This means no rebooting affected systems (unless containment requires it), enabling enhanced logging, capturing memory dumps if malware is suspected, and preserving network flow data. Destroyed evidence cannot be analyzed, and it creates gaps in your incident timeline that auditors and regulators will question.

Phase 3: Containment

Containment stops the bleeding. The goal is to prevent the incident from escalating while preserving your ability to investigate.

Short-term containment β€” Immediate actions to limit damage:

  • Isolate affected systems from the network (without shutting them down, preserving forensic state)
  • Revoke compromised credentials and rotate affected API keys
  • Block malicious IP addresses or domains at the firewall/WAF level
  • Disable compromised user accounts
  • Enable enhanced monitoring on potentially affected systems

Long-term containment β€” Sustainable measures while you work on eradication:

  • Deploy patches or configuration changes to close the vulnerability
  • Implement additional monitoring for the attack vector
  • Set up temporary access controls to limit exposure
  • Redirect traffic away from compromised components to clean systems

Decision framework: when to take services offline. This is one of the hardest decisions during an incident. Taking a service offline stops the attack but also stops your customers’ businesses. Factors to weigh:

  • Take offline if: Active data exfiltration is occurring and cannot be stopped otherwise, the attacker has persistent access that isolation cannot contain, or the integrity of customer data cannot be guaranteed while the system is running
  • Contain in place if: The attack vector can be closed without downtime, affected systems can be isolated at the network level, and the blast radius is contained to a subset of systems

Customer communication during containment. For P1 and P2 incidents affecting customer-facing services, communicate early and honestly. Customers notice when your service is degraded β€” silence erodes trust faster than bad news. Update your status page, send proactive notifications to affected customers, and commit to regular update intervals (every 30-60 minutes for P1, every 2-4 hours for P2).

Phase 4: Eradication and Recovery

Once the incident is contained, eliminate the root cause and restore normal operations.

Root cause identification. Determine exactly how the incident occurred. This is not optional β€” without root cause analysis, you cannot be confident that the same attack vector will not be exploited again. Common root causes in SaaS incidents include unpatched vulnerabilities, misconfigured cloud resources, compromised credentials (often from credential reuse), vulnerable dependencies, and insufficient input validation.

Remove the threat. Based on the root cause:

  • Patch the vulnerability that was exploited
  • Remove malware or unauthorized access mechanisms (backdoors, unauthorized SSH keys, rogue IAM roles)
  • Rebuild compromised systems from known-good images rather than attempting to clean them
  • Rotate all credentials that may have been exposed, including service accounts and API keys

System restoration and validation. Before restoring service:

  • Verify that the vulnerability is fully patched across all affected systems
  • Confirm that unauthorized access mechanisms have been removed
  • Validate data integrity β€” compare backups against production data to detect tampering
  • Run security scans against restored systems
  • Confirm monitoring is capturing events from restored systems

Monitor for recurrence. After restoration, maintain heightened monitoring for at least 30 days. Sophisticated attackers may have established multiple persistence mechanisms, and eradicating one does not guarantee the others are eliminated.

Phase 5: Post-Incident Review

The post-incident review is where your organization learns from the incident and improves its defenses. SOC 2 auditors give significant weight to this phase because it demonstrates your organization’s commitment to continuous improvement.

Blameless post-mortem process. Conduct the review within 5 business days of incident resolution while details are fresh. The review must be blameless β€” focused on systemic factors, not individual fault. If people are afraid to be honest about what happened, your review will miss the insights that prevent future incidents.

Document the following:

  • Timeline β€” Minute-by-minute reconstruction of the incident from first detection to full resolution
  • Root cause β€” Technical root cause and contributing factors (process failures, monitoring gaps, training deficiencies)
  • Impact β€” Number of customers affected, data exposed, service downtime, financial cost
  • What went well β€” Detection speed, team coordination, communication effectiveness
  • What could be improved β€” Gaps discovered, processes that failed, tools that were missing
  • Action items β€” Specific, assigned, time-bound improvements

Update your risk register. Every incident should feed back into your risk assessment. Did the incident reveal a risk you had not identified? Was a risk scored too low? Did a control you relied on fail to operate effectively? Update your risk register to reflect what you learned.

Share learnings across the organization. Publish a sanitized version of the post-mortem internally. Engineering teams that were not involved in the incident may have similar vulnerabilities in their systems. Sharing learnings prevents the same class of incident from recurring in a different part of your platform.

Incident Response Testing

Having an incident response plan is necessary but not sufficient. SOC 2 expects you to test that plan and demonstrate that your team can execute it.

Tabletop exercises β€” Walk through a hypothetical incident scenario with your response team. The facilitator presents the scenario in stages, and the team discusses how they would respond at each stage. This tests decision-making, communication, and coordination without impacting production systems. Tabletop exercises should cover a variety of scenarios: data breach, ransomware, DDoS, insider threat, and vendor compromise.

Simulated incidents β€” Inject a realistic security event into your monitoring systems and observe whether your team detects and responds correctly. This tests your detection capabilities and response procedures under conditions closer to reality than a tabletop exercise.

Red team exercises β€” Engage an external team (or internal red team) to simulate real attacks against your systems. This tests your detection, response, and containment capabilities against adversarial behavior. Red team exercises are the most realistic test but also the most resource-intensive.

Frequency: Conduct at least one tabletop exercise annually. SaaS companies with mature security programs run them quarterly, rotating through different scenarios. Simulated incidents and red team exercises can be less frequent but should occur at least annually.

Document test results and improvements. Every test should produce a written report documenting what was tested, what worked, what failed, and what improvements will be made. This documentation is key audit evidence β€” auditors will specifically request evidence of incident response testing.

What Auditors Test

Understanding what auditors examine helps you prepare evidence proactively rather than scrambling during the audit window.

Incident response policy and procedures exist and are current. Auditors will request your incident response policy and check the last review/update date. A policy dated three years ago is a finding. Review and update your policy at least annually β€” more often if significant changes occur.

Team members know their roles. Auditors may interview team members to verify they understand their responsibilities during an incident. Training records and exercise participation logs serve as supporting evidence.

Monitoring and detection capabilities are in place and functioning. Auditors will verify that the monitoring systems described in your system description are actually deployed, configured, and generating alerts. They may request sample alerts to confirm the system is active.

Evidence of actual incident handling. If security incidents occurred during the observation period, auditors will request documentation of how they were handled. This includes the incident timeline, classification, response actions, customer notifications (if applicable), and post-incident review. Handling incidents well during the observation period is strong evidence of operating effectiveness.

Evidence of testing. If no incidents occurred (or even if they did), auditors will request evidence that you tested your incident response capability. Tabletop exercise reports, simulation results, and red team findings all qualify.

Post-incident reviews and follow-up actions. Auditors want to see that lessons learned from incidents and tests were translated into actual improvements β€” updated procedures, new controls, enhanced monitoring. A post-mortem with action items that were never completed is worse than no post-mortem at all.

Common Mistakes

No defined severity levels. When everything is treated with the same urgency, nothing gets the urgency it deserves. P1 incidents require immediate all-hands response. P4 incidents can wait until business hours. Without severity definitions, your team either over-responds to minor events (causing fatigue) or under-responds to critical ones (causing damage).

Incident response plan that has never been tested. A plan that exists only on paper provides false confidence. When a real incident occurs and the team opens the plan for the first time, they discover outdated contact information, undefined escalation paths, and procedures that do not match their current architecture. Test annually at minimum.

No post-incident review process. Incidents without post-mortems are wasted learning opportunities. Worse, they signal to auditors that your organization does not have a continuous improvement mindset. Every P1 and P2 incident should have a documented post-incident review. P3 incidents should be reviewed in aggregate at least monthly.

Customer notification delays. SaaS companies often delay customer notifications because they want complete information before communicating. This is a mistake. Customers and regulators expect timely notification, even if initial information is incomplete. Your notification templates should support staged communication: β€œWe are aware and investigating” followed by detailed updates as information becomes available.

Not preserving forensic evidence. In the rush to restore service, teams often reboot systems, redeploy from scratch, or rotate logs before forensic analysis is complete. Establish a clear protocol: preserve first, then restore. If you must restore service immediately, capture disk images and memory dumps before making changes.

Incident response team that excludes non-technical stakeholders. Incident response is not purely a technical exercise. Legal counsel needs to assess notification obligations. Customer success needs to manage customer communications. Executive leadership needs to make business decisions about service continuity. Include these stakeholders in your plan and your testing exercises.

How GRCTrail Helps

GRCTrail gives SaaS teams the structure and tooling to manage incidents from detection through post-mortem review, while automatically generating the evidence auditors need.

  • Incident response playbook templates pre-configured for common SaaS incident types β€” data breach, ransomware, DDoS, insider threat, vendor compromise β€” with step-by-step procedures your team can follow under pressure
  • Incident tracking and timeline documentation that captures every action, decision, and communication during an incident in a timestamped, auditable format
  • Notification workflow management that tracks customer and regulatory notification obligations, deadlines, and completion status so nothing slips through the cracks
  • Post-incident review templates with structured fields for timeline, root cause, impact, lessons learned, and action items β€” formatted for both internal learning and auditor review
  • Audit evidence generation that automatically compiles incident records, test results, and improvement tracking into the format your auditor expects

Get started with GRCTrail β†’

#soc-2 #incident-response #security #compliance #saas #breach