Fraud detection systems fail less often because of model quality and more often because of data quality, feature freshness, and serving inconsistency. A model can score 0.95 AUC in notebooks and still miss real attacks in production if online features drift from training features or arrive too late.
In this guide, you will build a practical real-time feature pipeline for fraud detection using Apache Flink for stream processing, Feast for feature management, and XGBoost for low-latency scoring. The goal is not a toy demo, but an architecture you can adapt to production systems.
What we are building
- Ingest transaction and event streams.
- Compute rolling fraud features in Flink.
- Materialize consistent online/offline features with Feast.
- Train an XGBoost model from offline features.
- Serve online features and score transactions in milliseconds.
System architecture
At a high level, data flows like this:
- Raw events land in Kafka topics (
transactions,logins,device_events). - Flink computes windowed and entity-level aggregates in near real time.
- Computed features are written to an online store (Redis) and offline store (Parquet on object storage).
- Feast registers feature views and serves them consistently for training and inference.
- A scoring API fetches online features and runs XGBoost predictions.
Why this stack works
- Flink gives reliable event-time windows and stateful stream processing.
- Feast reduces training-serving skew by standardizing feature definitions.
- XGBoost remains a strong baseline for tabular fraud data with fast inference.
Step 1: Define your event schema and entity keys
Fraud systems live or die by entity design. Make entity keys explicit and stable from day one.
{
"transaction_id": "txn_92413",
"user_id": "u_101",
"card_hash": "9f3e...",
"device_id": "d_44",
"ip": "49.37.21.10",
"amount": 17899.0,
"currency": "INR",
"merchant_id": "m_19",
"event_time": "2026-04-15T04:20:10Z"
}
Common entity keys for fraud:
user_idcard_hashdevice_idip
Step 2: Compute streaming features in Flink
Start with simple, high-signal features: transaction velocity, failed login bursts, and geolocation jumps.
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment, EnvironmentSettings
env = StreamExecutionEnvironment.get_execution_environment()
settings = EnvironmentSettings.in_streaming_mode()
t_env = StreamTableEnvironment.create(env, environment_settings=settings)
t_env.execute_sql("""
CREATE TABLE transactions (
transaction_id STRING,
user_id STRING,
amount DOUBLE,
ip STRING,
event_time TIMESTAMP(3),
WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
'connector' = 'kafka',
'topic' = 'transactions',
'properties.bootstrap.servers' = 'kafka:9092',
'format' = 'json'
)
""")
features = t_env.sql_query("""
SELECT
user_id,
TUMBLE_END(event_time, INTERVAL '5' MINUTE) AS feature_ts,
COUNT(*) AS tx_count_5m,
SUM(amount) AS amount_sum_5m,
AVG(amount) AS amount_avg_5m
FROM transactions
GROUP BY user_id, TUMBLE(event_time, INTERVAL '5' MINUTE)
""")
Keep your first release small. A compact set of reliable features beats dozens of unstable ones.
Step 3: Register feature views in Feast
Feast acts as the contract between data engineering and ML serving.
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
user = Entity(name="user_id", join_keys=["user_id"])
fraud_source = FileSource(
path="s3://fraud-offline/features/user_5m.parquet",
timestamp_field="feature_ts",
)
user_tx_5m = FeatureView(
name="user_tx_5m",
entities=[user],
ttl=None,
schema=[
Field(name="tx_count_5m", dtype=Int64),
Field(name="amount_sum_5m", dtype=Float32),
Field(name="amount_avg_5m", dtype=Float32),
],
online=True,
source=fraud_source,
)
After defining your repo, run feast apply and materialize online features on a schedule.
Step 4: Train XGBoost with point-in-time correct data
Use Feast historical retrieval to avoid leakage. Do not join labels to current features directly.
import pandas as pd
import xgboost as xgb
from feast import FeatureStore
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
store = FeatureStore(repo_path=".")
entity_df = pd.read_parquet("labels/transaction_labels.parquet")
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"user_tx_5m:tx_count_5m",
"user_tx_5m:amount_sum_5m",
"user_tx_5m:amount_avg_5m",
],
).to_df()
X = training_df[["tx_count_5m", "amount_sum_5m", "amount_avg_5m"]]
y = training_df["is_fraud"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = xgb.XGBClassifier(
n_estimators=200,
max_depth=5,
learning_rate=0.05,
subsample=0.9,
colsample_bytree=0.9,
)
model.fit(X_train, y_train)
probs = model.predict_proba(X_test)[:, 1]
print("AUC:", roc_auc_score(y_test, probs))
model.save_model("fraud_xgb.json")
Step 5: Serve online features and score in real time
Your API should fetch features from Feast online store and score in one request path.
from fastapi import FastAPI
from feast import FeatureStore
import xgboost as xgb
import numpy as np
app = FastAPI()
store = FeatureStore(repo_path=".")
model = xgb.XGBClassifier()
model.load_model("fraud_xgb.json")
@app.post("/score")
def score(user_id: str):
fv = store.get_online_features(
features=[
"user_tx_5m:tx_count_5m",
"user_tx_5m:amount_sum_5m",
"user_tx_5m:amount_avg_5m",
],
entity_rows=[{"user_id": user_id}],
).to_dict()
row = np.array([[
fv["tx_count_5m"][0] or 0,
fv["amount_sum_5m"][0] or 0.0,
fv["amount_avg_5m"][0] or 0.0,
]], dtype=float)
p = float(model.predict_proba(row)[0][1])
decision = "block" if p >= 0.92 else "review" if p >= 0.75 else "allow"
return {"fraud_probability": p, "decision": decision}
Production guardrails you should add
1) Latency budgets
- Feature fetch P95 under 20 ms.
- Total scoring P95 under 80 ms.
- Fallback rules when online store is degraded.
2) Feature quality checks
- Null-rate alerts by feature.
- Distribution drift checks per hour.
- Late event ratio and watermark lag tracking.
3) Model risk controls
- Thresholds by payment channel or risk segment.
- Champion-challenger model rollout.
- Human review queue for medium-confidence cases.
Common pitfalls
- Training-serving skew: solved by Feast-managed feature definitions.
- Feature leakage: solved by point-in-time historical retrieval.
- Unbounded cardinality: cap user/device dimensions where possible.
- Overfitting to known fraud rings: keep recent holdout windows and retrain frequently.
Final checklist
Before go-live, verify these items:
- Backfill and online materialization produce aligned feature values.
- Model artifacts are versioned and reproducible.
- Feature and prediction logs are retained for audit.
- Incident runbooks cover data delays and false-positive spikes.
With this pattern, your fraud model becomes an operational system, not just a notebook experiment. You can start with a small feature set, ship quickly, and improve safely as traffic and fraud patterns evolve.

Leave a Reply