Threat Hunting Basics

TL;DR

Threat hunting is the proactive search for attackers who have evaded your existing detections. It starts from a hypothesis (something is in, find it) rather than a triggered alert. Real hunting follows a cycle of hypothesis, query, analysis, closure, and detection improvement. Most organisations that say they are hunting are actually doing alert triage with a different name. The difference matters because hunting catches what your alerts miss, and your alerts will miss things.

What it is

Threat hunting is the active, hypothesis-driven search for attackers who are already inside your environment but have not triggered any alerts. The starting assumption is simple and uncomfortable: your detections are incomplete, attackers are sometimes good enough to evade them, and the only way to find that subset is to go looking for them deliberately.

The work is investigative rather than reactive. A threat hunter does not start from an alert. They start from a question, usually informed by threat intelligence or knowledge of attacker tradecraft, and they go looking for evidence that the question's answer is yes. If the evidence is there, they have found something. If it is not, they have either confirmed the absence (useful) or discovered that they need a better question (also useful).

The output of threat hunting is not just "found bad guys". The more common output is improved detection. A hunt that did not find an active intrusion frequently produces a new detection rule, a query that should be running automatically, or a gap in telemetry that needs to be addressed. Hunting is one of the few security activities that compounds: every cycle makes the next one cheaper.

Why it matters

If detection rules covered everything that mattered, threat hunting would not be necessary. The reality is that detection rules cover the patterns we know about, on the data we already collect, with the logic we have already written. Anything outside that overlap (a novel technique, an attacker who lives off the land carefully, a behaviour that looks like noise in your environment specifically) gets through.

Industry numbers on dwell time vary by report and over time, but the point is consistent: attackers who get past initial detection often operate inside environments for weeks or months before being noticed. The Mandiant M-Trends reports have shown global median dwell times that have gradually fallen but still measure in days to weeks. The reductions are largely driven by external notification (a third party tells you), not by internal detection. Threat hunting is one of the few activities that reduces internal dwell time directly.

There is also a programme-quality argument. A SOC that only operates on alerts develops a particular blind spot: it gets very good at handling the things it has rules for and very bad at noticing the things it does not. Hunting forces the team to operate outside the rule set, which builds the analytical skills that matter when an actual incident shows up looking unfamiliar.

The hunting cycle

Hunting works as a repeatable cycle. The exact phrasing varies between models, but the substance is consistent.

Hypothesis

Every hunt starts with a specific, testable hypothesis. Concrete enough that you can imagine the data that would confirm or refute it.

Bad: "An attacker is in our network."

Better: "An attacker is using PowerShell with encoded commands to download payloads from external servers."

Best: "An attacker is using PowerShell with -EncodedCommand arguments to execute scripts that initiate outbound connections to non-corporate IP space within five seconds, on hosts that do not normally run PowerShell scripts."

Common sources for hypotheses: threat intelligence about a group active in your sector, a new technique published in research, a new ATT&CK sub-technique, an incident at a peer organisation, a pattern from a previous hunt, anomalies surfaced by analytics tools.

Query

Once you have a hypothesis, you build the queries that would surface evidence for or against it. This is where the technical depth shows up. A hypothesis about PowerShell encoded commands translates into queries against EDR process telemetry. A hypothesis about lateral movement translates into queries across authentication logs and network flows.

The queries should be specific enough that the result set is manageable. A query that returns a hundred thousand rows is unlikely to surface anything actionable. A query that returns ten rows is the right size for human review.

Iterating on the query is normal. The first version often returns too much or too little. Tightening it (adding context, excluding known-good patterns, joining additional sources) is part of the work.

Analyse

The query produces results. Analysis is the work of looking at those results and deciding what they mean. Most results in most hunts are benign. The hunter's job is to understand each one well enough to confirm benignity or escalate.

Analysis usually involves correlating across multiple data sources: the EDR process record joins with network logs, authentication logs, DNS data, and possibly user behaviour data. Each correlation tightens the picture.

The discipline here is to resist confirmation bias in both directions. The hunter wants to find something (because finding something is success). The hunter also wants to not find something (because if there is nothing, they get to close out and move on). Both biases distort analysis. Good hunters write down what they expected to find before they look at the data, then compare.

Close

Every hunt closes one of three ways:

Found something. The hypothesis was confirmed. The result hands off to incident response.
Found nothing. The hypothesis was refuted, or the data does not support it. The hunt closes with a note on what was checked.
Found an adjacent issue. Not what was being hunted, but something else worth following up on. Spawn a separate hunt or hand off to operations.

Closing matters because hunts that never close consume resources without producing improvements. Time-boxing each hunt (a week or two for most) prevents the team from spending months on one inconclusive question.

Improve detections

This is the step that compounds. Every hunt teaches you something about your data, your environment, or attacker behaviour. That learning should turn into a detection rule, a new data source, or a refined query that runs automatically going forward.

If a hunt found nothing, the queries that were built can often become saved searches that fire if the pattern ever does appear. If a hunt found something, the lessons feed both into a new detection and into the post-incident review.

A team that does not capture this output produces good investigations and bad programmes. A team that does captures the value of every hunt, even the ones that found nothing.

How it works in practice

The cycle is clean. Real hunts are messier. A few patterns from how this actually plays out.

Hunts often start from threat intelligence. A new vendor report describes a campaign by a group active in your sector. The hunt is "given what this group does, would we see them if they were here?" This is one of the highest-value uses of CTI: not as a feed of alerts, but as a source of hunting hypotheses.

Living-off-the-land techniques produce the best hunts. Attackers using PowerShell, WMI, scheduled tasks, and built-in administration tools generate telemetry that looks legitimate at first glance. Hunting reliably surfaces the small percentage of usage that is actually adversarial. Signature-based detection cannot, because the binary is signed and legitimate.

Identity is the highest-value data source. Authentication logs, MFA approvals, OAuth grants, privileged access usage. Almost every modern intrusion shows up in identity data, often as the first usable signal.

The first hunts in a programme are usually noisy. Data sources are not yet tuned, queries return too much, analysis takes longer than expected. Programmes that quit after the first few unproductive hunts miss the point. The fifth hunt is much faster than the first.

Common starting points

Specific, repeatable hunts that mature programmes run regularly:

Suspicious DNS resolutions. DNS to known DGA-style domains, very recently registered domains, or domains with content delivery characteristics that do not match expected use. DNS data is high-signal because every C2 channel resolves something.

Unusual authentication patterns. Logins from new countries, impossible travel, MFA approvals at odd times, bursts of failed logins followed by a success, OAuth grants to unfamiliar applications.

Signs of lateral movement. SMB or RDP connections between hosts that do not normally communicate, scheduled task creation across multiple machines, WMI execution patterns matching known TTPs, administrative shares used from non-administrator accounts.

Living-off-the-land binary usage. PowerShell with encoded commands, certutil downloading from external sources, bitsadmin transferring files, mshta executing remote scripts, regsvr32 calling out. Each has legitimate uses, which is why they are abused.

Persistence mechanisms. New scheduled tasks, new services, new run keys, WMI event subscriptions. These are what attackers leave behind to survive reboots.

Specific attacker behaviours. Hunting for AsyncRAT in beaconing patterns, AdminSDHolder modifications (a classic AD persistence technique), OAuth consent grants to attacker-controlled apps, Kerberoasting (TGS requests for service accounts with weak passwords).

Threat hunting maturity

The Sqrrl maturity model has five stages: Initial (no hunting), Minimal (informal, when something feels off), Procedural (following blog playbooks and threat intel reports), Innovative (developing your own hypotheses for your environment), and Leading (hunting driving the detection programme).

Most organisations sit at Level 1 or 2. The gap between Level 2 and 3 is significant because Level 3 requires hunters who can write their own queries, understand their environment, and develop hypotheses rather than running checklists from external sources.

Tools and data sources

Hunting requires good data. The minimum useful set:

EDR telemetry. Process creation, file creation, registry modification, network connection. Without process-level telemetry, hunting is heavily limited.
Authentication logs. AD, Entra ID, identity provider logs. Often the highest-signal source for modern attacks.
Network flow logs. NetFlow, VPC flow logs, firewall logs.
DNS logs. Resolution data, ideally enriched with first-seen and reputation context.
Cloud audit logs. AWS CloudTrail, Azure Activity, GCP Audit. Critical for cloud-focused hunts.

The query tools vary. SIEMs (Splunk, Sentinel, Elastic, Chronicle) are most common. EDR platforms have their own query languages (KQL for Defender, OSquery for others). Some teams use notebook-based hunting (Jupyter with custom data connectors) for analytical work.

CTI informs what to hunt for. A good feed gives you the hypotheses. The hunting programme provides the analytic capability to test them. Without CTI, hunting defaults to the same standard hunts repeatedly. Without hunting, CTI gets read and filed.

Common mistakes

The patterns that turn hunting programmes into theatre.

Threat hunting as a synonym for "look at SIEM alerts". This is alert triage, not hunting. The defining feature of hunting is starting from a hypothesis, not from a triggered alert.
Hunts with no hypothesis. "Let's see if anything looks weird." Returns thousands of rows, none of it actionable, hunter quits after a day.
Hunts that never close. Open-ended investigations that drag on for weeks. Time-box every hunt.
No detection improvement output. Hunts that found nothing get filed and forgotten. Every hunt should produce either an incident, a saved detection, or a documented gap.
Hunting without the data. Trying to hunt without process telemetry, auth logs, or network data. The data needs to come first.
Hunting only on endpoint. Modern attacks are increasingly identity-driven and cloud-driven. Endpoint-only hunting misses most of the surface.
No measurement. Hunts run, findings get reported, but nobody tracks how the programme is doing over time.

Best practices

Always start with a hypothesis. Specific, testable, written down before the queries are built.
Time-box every hunt. A week or two. Force a decision at the end.
Capture the output of every hunt. Detection rule, saved query, data gap, runbook update. Something concrete every time.
Hunt across identity, not just endpoint. The data is where the attackers are.
Use CTI to source hypotheses. Industry-relevant intelligence produces more useful hunts than generic curiosity.
Maintain a hunt log. Every hunt: hypothesis, queries, results, output. Search the log before starting a new hunt.
Pair hunters with detection engineers. The handoff from "interesting query" to "production detection" should be hours, not weeks.
Practice on incidents. After every incident, hunt for the same TTPs in adjacent areas. Attackers rarely do something only once.
Test the data before you trust it. Confirm that the telemetry you think you have is actually flowing.

The teams that hunt well end up with a smaller delta between what their detections cover and what attackers actually do. The teams that do not have to wait for an alert or an external party to tell them they have been compromised.

ScruteX feeds your threat hunting with curated, sector-specific IOCs and TTPs, so your hunts are informed by what's actually targeting your industry today.

Learn more