Incident Response Basics

TL;DR

Incident response is the structured process of detecting, containing, eradicating, and recovering from a security incident, and then learning from it. The NIST cycle (Preparation, Detection and Analysis, Containment, Eradication, Recovery, Lessons Learned) is the most widely used model. The teams that handle incidents well prepare before anything happens, resist the urge to wipe machines before evidence is preserved, and call external IR firms early rather than late.

What it is

Incident response (IR) is the discipline of handling a security incident from the moment it is detected through to the moment the organisation is back to normal operations and has captured what it learned. It is not a tool, it is not a single team, and it is not something that starts when the pager fires. It is a programme that combines preparation, technical investigation, decision-making under pressure, communication, and post-incident improvement.

The work spans technical activity (forensic analysis, containment actions, recovery operations) and non-technical activity (executive communication, legal and regulatory engagement, customer notification, sometimes ransom negotiation). A good IR programme runs all of these in parallel under a single coordinated structure, usually with someone designated as the incident commander.

Why it matters

The difference between a contained incident and a public crisis is almost always the quality of the response, not the sophistication of the attack. The same ransomware group, attacking two organisations on the same week using the same techniques, can produce two completely different outcomes: one where the affected systems are isolated within hours and the business continues operating, and one where the entire estate is encrypted, recovery takes weeks, and the regulator gets involved.

The reasons this happens are not glamorous:

One organisation rehearsed. They had a runbook, they knew who to call, they had backups they had tested.
The other did not. They scrambled to find out who owned what, made decisions in the wrong order, destroyed evidence accidentally, and called for outside help only after the situation was unrecoverable.

Incident response also matters for non-incident reasons. Regulators in most jurisdictions now require evidence of a defined IR capability. Cyber insurance carriers ask about it. Customers and partners increasingly include IR maturity in vendor risk assessments. The programme exists whether the organisation invests in it or not. The choice is whether the existence is on paper or in working order.

The NIST IR cycle

The most commonly used model is the NIST SP 800-61 incident response cycle. It has six phases, which run as a loop rather than a straight line.

Preparation

Preparation is everything you do before an incident. This phase most determines outcomes.

Components include:

Documented IR plan. Roles, escalation paths, decision authority, communication trees.
Incident classification scheme. Event vs incident vs major incident, and what triggers what response level.
Tooling. EDR, SIEM, log retention long enough to investigate, forensic acquisition tools.
Backups that have been tested. Many organisations find out during ransomware that the backups were also encrypted, or that nobody had tested restoring them in years.
Playbooks. Specific procedures for common scenarios: ransomware, BEC, data exfiltration, insider threat, lost or stolen device.
Tabletop exercises. Walking through scenarios with the full team to validate the plan.
External relationships. Pre-agreed retainers with IR firms, law enforcement contacts, legal counsel, PR firm if needed.

Most organisations under-invest in preparation because the work has no immediate visible payoff. The payoff arrives when an incident happens and the team can move in hours rather than days.

Detection and analysis

Detection sources include EDR alerts, SIEM correlations, user reports, third-party notifications, and external sources (CISA, FBI, threat intel feeds).

Analysis confirms the alert is a real incident, scopes it, and characterises what is happening. Questions to answer: is this a real incident or false positive, what is the scope, what is the attacker doing right now, what initial access vector was used, what is the apparent objective, is it ongoing or finished.

Analysis is the hardest phase to time-box. Pressure to act is high, but the risk of acting on partial information is also high. Mature teams resist the urge to skip this phase.

Containment

Containment limits the blast radius. It is not eradication. The goal is to stop the incident from getting worse while you figure out what to do next.

Containment can be:

Short-term. Network isolation of an infected host, disabling a compromised account, blocking a malicious IP at the firewall, revoking a session token.
Long-term. Rebuilding a compromised system in parallel while keeping the original isolated for evidence, applying compensating controls across an entire segment.

The trade-off in containment is between stopping the attack and preserving evidence. Pulling the network cable on an infected host stops it talking to C2. It also kills volatile memory that might contain credentials, malware behaviour, or attribution clues. The decision needs to be deliberate, not panicked.

Eradication

Eradication removes the attacker from the environment. The visible signs (malware on disk, the obvious backdoor) are usually only part of the picture. Eradication includes removing all persistence mechanisms, patching the initial access vulnerability, resetting credentials the attacker touched, rotating exposed keys and tokens, closing accounts the attacker created, and reviewing for second-stage payloads.

Premature eradication is one of the most common mistakes. If you eradicate before fully understanding the scope, the attacker frequently still has another foothold and is back in within days, this time more careful about hiding.

Recovery

Recovery brings systems back to normal operations: restoring from clean backups, bringing services online in a controlled order, heightened monitoring on previously affected systems, validating that the attacker is actually gone (not assuming), and communicating to stakeholders.

The temptation to declare victory and move on is strongest here. Resisting matters because attackers commonly return to environments they know.

Lessons learned

The post-incident review is the phase most organisations skip. It is also the phase that produces the most long-term value.

A good session includes timeline reconstruction (what happened, when, who did what), what worked, what did not, action items with owners and dates, and metrics (time to detect, contain, recover) tracked across incidents.

The output is not a report that gets filed. The output is a list of changes that get implemented, ideally with the next exercise scheduled to test them.

How it works in practice

The phases above are clean. Real incidents are not. A few patterns from real responses:

Detection is often external. A meaningful percentage of incidents are detected by someone other than the organisation: customers, partners, researchers, or law enforcement. Internal detection has improved with EDR but external detection is still common.

The first hour is mostly confusion. The initial alert is rarely the full picture. Resisting the urge to act on partial information is hard but important.

Communication is parallel work. While the technical team investigates, someone needs to be talking to executives, legal, comms, and possibly regulators. The two streams run in parallel, with the incident commander connecting them.

External help is faster when retained in advance. A retainer with an IR firm means they can be on the scene in hours. Calling cold means a contract negotiation in the middle of a crisis. Major IR firms (Mandiant, CrowdStrike Services, Unit 42, Kroll, Stroz Friedberg) all offer retainers.

The Cyber Kill Chain as an alternative model

The Lockheed Martin Cyber Kill Chain (published 2011) defines an intrusion as seven stages: Reconnaissance, Weaponisation, Delivery, Exploitation, Installation, Command and Control, Actions on Objectives.

Most teams use both models: ATT&CK for the catalogue of specific techniques, the Kill Chain for the narrative arc of an intrusion, NIST for the response process. Each answers a different question.

Common mistakes

The things that go wrong repeatedly:

Over-containment that disrupts business. Isolating an entire segment when one host would have done. The cure becomes the disaster.
Premature eradication that destroys evidence. Wiping the infected machine before forensic acquisition removes any chance of understanding what happened.
Not preserving evidence properly. Powering off before memory acquisition, not capturing logs with short retention windows, not preserving disk images.
Not having a call tree. When the incident happens at 2am on a Sunday, who do you call first?
Calling external IR too late. By day three, evidence has degraded and the attacker has dug in. The IR firm starts from a worse position than they would have on day one.
No incident commander. Multiple people making decisions in parallel without coordination produces conflicting actions and confused stakeholders.
Skipping lessons learned. The same mistakes get repeated.

Tabletop exercises and why they matter

A tabletop exercise is a guided walkthrough of a hypothetical incident, with the full IR team and relevant stakeholders, run by a facilitator who introduces information and complications as the scenario evolves.

What they reveal: whether the plan works when followed, whether the named people are still in their roles, whether decision authority is clear, whether external contacts are still valid, whether executives know what is expected of them.

Most organisations that run tabletops for the first time discover gaps nobody had anticipated. The cost of finding them in an exercise is much lower than finding them in a real incident. The recommended cadence is at least annually.

Ransomware response specifics

Ransomware deserves its own callouts because it is the most common major incident pattern:

Containment trumps eradication speed. Stop spread first. Rebuilding is the next phase.
Backups are the answer if they exist and are clean. If they do not, options narrow to paying or rebuilding from scratch.
The ransom decision is not just technical. Legal, regulatory, insurance, and sanctions considerations all factor in. Some jurisdictions prohibit payment to specific groups.
Negotiation is its own discipline. If payment is on the table, professional negotiators (often through IR firms) handle it.
Communications matters more than usual. Ransomware incidents often go public quickly because attackers publish on leak sites.

Regulatory disclosure timelines

Several jurisdictions have legally binding disclosure timelines:

GDPR (EU). Personal data breaches reported to the supervisory authority within 72 hours of becoming aware.
SEC (US public companies). Material cybersecurity incidents disclosed within four business days of materiality determination.
HIPAA (US healthcare). Breach notification within 60 days for breaches affecting 500 or more individuals.
PCI DSS. Notification to acquirer and brands within 24 hours of confirmed compromise of cardholder data.
NIS2 (EU). Initial notification within 24 hours, follow-up within 72.

Timelines run from when you "become aware", a legal concept that needs early engagement with counsel. Decision-making about disclosure cannot wait until eradication is complete. Legal and compliance need to be part of the IR team from the first hour.

Best practices

Have a written IR plan that is owned and current. Not a binder on a shelf. A document the team has read, with people named and decision authority clear.
Run tabletops at least annually. Different scenarios each time. Include executives.
Pre-agree an external IR retainer. Cost is small. Speed gain in an incident is enormous.
Test backups regularly. Not just verifying they exist. Actually restoring data and confirming it works.
Define an incident commander role. One person, named, with authority to make calls during an incident.
Engage legal early. Disclosure obligations, evidence handling, and regulator interaction all require legal input from the start.
Preserve evidence before remediation. Disk images, memory captures, logs. Once gone they cannot be recovered.
Run a real lessons-learned after every incident. With named action items and follow-through.
Track metrics across incidents. Mean time to detect, contain, recover. Improvement only happens if you measure it.

The teams that do this well make incident response look almost boring: containment in hours, recovery in days, lessons learned implemented in weeks. The teams that do not handle every incident as if it were the first.

ScruteX provides the external visibility most IR teams lack: what's been leaked, what's appearing on dark web markets, who's claiming responsibility for an attack.

Learn more