Credit card fraud poses a significant threat to the financial industry, leading to billions of dollars in losses every year. To combat this, machine learning models have been developed to detect and prevent fraudulent transactions in real time. In this article, we'll walk through the process of building a real-time credit card fraud detection system using FastAPI, a modern web framework for Python, and a Random Forest classifier trained on the popular Credit Card Fraud Detection Dataset from Kaggle.
The goal of this project is to create a web service that predicts the likelihood of a credit card transaction being fraudulent. The service accepts transaction data, preprocesses it, and returns a prediction along with the probability of fraud. This system is designed to be fast, scalable, and easy to integrate into existing financial systems.
The dataset used in this project is the Credit Card Fraud Detection Dataset from Kaggle, which contains 284,807 transactions, of which only 492 are fraudulent. This class imbalance presents a challenge, but it's addressed by oversampling the minority class.
The features are first standardized using a StandardScaler from scikit-learn. The dataset is then split into training and testing sets. Given the imbalance, the RandomOverSampler technique is applied to balance the classes before training the model.
from sklearn.preprocessing import StandardScaler from imblearn.over_sampling import RandomOverSampler # Standardize features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Balance the dataset ros = RandomOverSampler(random_state=42) X_resampled, y_resampled = ros.fit_resample(X_scaled, y)
We train a Random Forest classifier, which is well-suited for handling imbalanced datasets and provides robust predictions. The model is trained on the oversampled data, and its performance is evaluated using accuracy, precision, recall, and the AUC-ROC curve.
from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, roc_auc_score # Train the model model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_resampled, y_resampled) # Evaluate the model y_pred = model.predict(X_test_scaled) print(classification_report(y_test, y_pred)) print("AUC-ROC:", roc_auc_score(y_test, model.predict_proba(X_test_scaled)[:, 1]))
With the trained model and scaler saved using joblib, we move on to building the FastAPI application. FastAPI is chosen for its speed and ease of use, making it ideal for real-time applications.
The FastAPI application defines a POST endpoint /predict/ that accepts transaction data, processes it, and returns the model's prediction and probability.
from fastapi import FastAPI, HTTPException from pydantic import BaseModel import joblib import pandas as pd # Load the trained model and scaler model = joblib.load("random_forest_model.pkl") scaler = joblib.load("scaler.pkl") app = FastAPI() class Transaction(BaseModel): V1: float V2: float # Include all other features used in your model Amount: float @app.post("/predict/") def predict(transaction: Transaction): try: data = pd.DataFrame([transaction.dict()]) scaled_data = scaler.transform(data) prediction = model.predict(scaled_data) prediction_proba = model.predict_proba(scaled_data) return {"fraud_prediction": int(prediction[0]), "probability": float(prediction_proba[0][1])} except Exception as e: raise HTTPException(status_code=400, detail=str(e))
To test the application locally, you can run the FastAPI server using uvicorn and send POST requests to the /predict/ endpoint. The service will process incoming requests, scale the data, and return whether the transaction is fraudulent.
uvicorn main:app --reload
You can then test the API using curl or a tool like Postman:
curl -X POST http://127.0.0.1:8000/predict/ \ -H "Content-Type: application/json" \ -d '{"V1": -1.359807134, "V2": -0.072781173, ..., "Amount": 149.62}'
The API will return a JSON object with the fraud prediction and the associated probability.
In this article, we've built a real-time credit card fraud detection system that combines machine learning with a modern web framework. The github link is here. The system is designed to handle real-time transaction data and provide instant predictions, making it a valuable tool for financial institutions looking to combat fraud.
By deploying this model using FastAPI, we ensure that the service is not only fast but also scalable, capable of handling multiple requests concurrently. This project can be further extended with more sophisticated models, improved feature engineering, or integration with a production environment.
To enhance the system further, consider the following:
The above is the detailed content of Building a Real-Time Credit Card Fraud Detection System with FastAPI and Machine Learning. For more information, please follow other related articles on the PHP Chinese website!