rinoxRinox
TIPSIEM

MISP to Splunk Integration

Send threat intelligence from MISP into Splunk so analysts can pivot on IOCs from inside their normal hunting workflow without leaving the SIEM. The integration pulls new attribute additions from MISP events and forwards them to Splunk's HTTP Event Collector with the original MISP timestamp preserved.

The right architectural choice here is HEC (the dedicated ingestion endpoint), not the management REST API. HEC is designed for high-volume programmatic ingestion, accepts token auth, and supports batched NDJSON payloads. Per Rinox's Phase 1 (Right Endpoint Check), this rules out `/services/receivers/simple`.

What you get

A real excerpt from a generated MISP-to-Splunk integration. The full script ships with logging, env-var loading, error handling, FIFO-capped dedup state, and a README.

misp_to_splunk.py
# ============================================================
# RINOX INTEGRATION: MISP -> Splunk
# ============================================================
import os, json, time, logging
from collections import deque

# SECTION 1: LOGGING
logger = logging.getLogger("rinox.misp_to_splunk")
logging.basicConfig(level=os.getenv("LOG_LEVEL", "INFO"), format="%(asctime)s %(levelname)s %(name)s %(message)s")

# SECTION 2: AUTHENTICATION
MISP_URL    = os.environ["MISP_URL"]
MISP_KEY    = os.environ["MISP_API_KEY"]
HEC_URL     = os.environ["SPLUNK_HEC_URL"]
HEC_TOKEN   = os.environ["SPLUNK_HEC_TOKEN"]

# SECTION 3: SOURCE — MISP attribute search since last cursor
def fetch_attributes(since_ts: int):
    # /attributes/restSearch returns attributes with their parent event ID and
    # the attribute's own timestamp — we use those, not the event timestamp.
    body = {"timestamp": since_ts, "returnFormat": "json"}
    resp = requests.post(f"{MISP_URL}/attributes/restSearch",
                         headers={"Authorization": MISP_KEY, "Accept": "application/json"},
                         json=body, timeout=30)
    resp.raise_for_status()
    return resp.json().get("response", {}).get("Attribute", [])

# SECTION 4: TRANSLATE — child-level dedup, preserve source timestamp
def translate(attrs, seen_ids):
    payload, new_ids, max_ts = [], [], 0
    for a in attrs:
        aid = a["uuid"]
        if aid in seen_ids:
            continue
        ts = int(a["timestamp"])  # cast — never compare timestamps as strings
        payload.append({
            "time": ts,
            "sourcetype": "misp:attribute",
            "event": {"value": a["value"], "type": a["type"], "category": a["category"], "event_id": a["event_id"]},
        })
        new_ids.append(aid)
        if ts > max_ts:
            max_ts = ts
    return payload, new_ids, max_ts

# SECTION 5: DELIVER — NDJSON to HEC
def deliver(events):
    body = "\n".join(json.dumps(e) for e in events)
    resp = requests.post(HEC_URL,
                         headers={"Authorization": f"Splunk {HEC_TOKEN}"},
                         data=body, timeout=60)
    return 200 <= resp.status_code < 300

# SECTION 7: ORCHESTRATOR — state saved ONLY after HEC ACK

Common pitfalls

The mistakes that turn this integration from "works once" into "loses data silently for three weeks."

  • 01Track child attribute UUIDs in state, not parent MISP event UUIDs. Long-lived events accumulate new attributes for weeks — deduping by event ID makes the integration miss everything added after the first run.
  • 02Splunk HEC wants NDJSON, not pretty-printed JSON. Bash pipelines that use `jq` must pass `-c`. Python clients should join events with `\n` between objects, not commas inside an array.
  • 03Preserve the MISP attribute timestamp as the Splunk `time` field. Defaulting to `time.time()` destroys timeline correlation when older IOCs are added to historical events.
  • 04Authenticate to MISP with an API key, not a session cookie. Key rotation is the rotation story; cookie expiry breaks cron jobs at 3 a.m.
  • 05Respect MISP rate limits and Splunk HEC `429` responses. Exponential backoff with jitter, and never advance the state cursor until HEC ACKs.

Generate this for your environment.

Pre-fills the form with MISP and Splunk. You write a sentence about your use case, we write the rest.

Generate MISPSplunk

Frequently asked

  • Why HEC and not the REST receivers endpoint?

    HEC is the ingestion endpoint Splunk designed for programmatic, high-volume writes with token auth. The receivers endpoint is part of the management REST API and is the wrong tool for bulk threat-intelligence ingestion. Rinox enforces this in Phase 1 of the IRON standard.

  • How does the integration handle MISP events updated days after creation?

    It tracks the child attribute UUID, not the parent event UUID. New attributes added to an existing event are picked up on the next run because the dedup key changes.

  • What happens if HEC returns 503 mid-batch?

    The state cursor is not saved. The next run re-fetches the same window and retries delivery. Rinox's Appendix D rule: conditional advancement on explicit ACK.

  • Can I run this as a one-off backfill?

    Yes — pass `--initial` (or set `MISP_SINCE=0`) to ignore the state file and pull from the beginning. The generated script writes a state file after the first successful delivery so subsequent runs are incremental.

  • How do I add custom fields?

    Edit the `translate` function in Section 4. Field mapping lives there by design — never in Section 3 (fetch) or Section 5 (deliver).