


Building a Real-Time Credit Card Fraud Detection System with FastAPI and Machine Learning
Introduction
Credit card fraud poses a significant threat to the financial industry, leading to billions of dollars in losses every year. To combat this, machine learning models have been developed to detect and prevent fraudulent transactions in real time. In this article, we'll walk through the process of building a real-time credit card fraud detection system using FastAPI, a modern web framework for Python, and a Random Forest classifier trained on the popular Credit Card Fraud Detection Dataset from Kaggle.
Overview of the Project
The goal of this project is to create a web service that predicts the likelihood of a credit card transaction being fraudulent. The service accepts transaction data, preprocesses it, and returns a prediction along with the probability of fraud. This system is designed to be fast, scalable, and easy to integrate into existing financial systems.
Key Components
- Machine Learning Model: A Random Forest classifier trained to distinguish between fraudulent and legitimate transactions.
- Data Preprocessing: Standardization of transaction features to ensure the model performs optimally.
- API: A RESTful API built with FastAPI to handle prediction requests in real time.
Step 1: Preparing the Dataset
The dataset used in this project is the Credit Card Fraud Detection Dataset from Kaggle, which contains 284,807 transactions, of which only 492 are fraudulent. This class imbalance presents a challenge, but it's addressed by oversampling the minority class.
Data Preprocessing
The features are first standardized using a StandardScaler from scikit-learn. The dataset is then split into training and testing sets. Given the imbalance, the RandomOverSampler technique is applied to balance the classes before training the model.
from sklearn.preprocessing import StandardScaler from imblearn.over_sampling import RandomOverSampler # Standardize features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Balance the dataset ros = RandomOverSampler(random_state=42) X_resampled, y_resampled = ros.fit_resample(X_scaled, y)
Step 2: Training the Machine Learning Model
We train a Random Forest classifier, which is well-suited for handling imbalanced datasets and provides robust predictions. The model is trained on the oversampled data, and its performance is evaluated using accuracy, precision, recall, and the AUC-ROC curve.
from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, roc_auc_score # Train the model model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_resampled, y_resampled) # Evaluate the model y_pred = model.predict(X_test_scaled) print(classification_report(y_test, y_pred)) print("AUC-ROC:", roc_auc_score(y_test, model.predict_proba(X_test_scaled)[:, 1]))
Step 3: Building the FastAPI Application
With the trained model and scaler saved using joblib, we move on to building the FastAPI application. FastAPI is chosen for its speed and ease of use, making it ideal for real-time applications.
Creating the API
The FastAPI application defines a POST endpoint /predict/ that accepts transaction data, processes it, and returns the model's prediction and probability.
from fastapi import FastAPI, HTTPException from pydantic import BaseModel import joblib import pandas as pd # Load the trained model and scaler model = joblib.load("random_forest_model.pkl") scaler = joblib.load("scaler.pkl") app = FastAPI() class Transaction(BaseModel): V1: float V2: float # Include all other features used in your model Amount: float @app.post("/predict/") def predict(transaction: Transaction): try: data = pd.DataFrame([transaction.dict()]) scaled_data = scaler.transform(data) prediction = model.predict(scaled_data) prediction_proba = model.predict_proba(scaled_data) return {"fraud_prediction": int(prediction[0]), "probability": float(prediction_proba[0][1])} except Exception as e: raise HTTPException(status_code=400, detail=str(e))
Step 4: Deploying the Application
To test the application locally, you can run the FastAPI server using uvicorn and send POST requests to the /predict/ endpoint. The service will process incoming requests, scale the data, and return whether the transaction is fraudulent.
Running the API Locally
uvicorn main:app --reload
You can then test the API using curl or a tool like Postman:
curl -X POST http://127.0.0.1:8000/predict/ \ -H "Content-Type: application/json" \ -d '{"V1": -1.359807134, "V2": -0.072781173, ..., "Amount": 149.62}'
The API will return a JSON object with the fraud prediction and the associated probability.
Conclusion
In this article, we've built a real-time credit card fraud detection system that combines machine learning with a modern web framework. The github link is here. The system is designed to handle real-time transaction data and provide instant predictions, making it a valuable tool for financial institutions looking to combat fraud.
By deploying this model using FastAPI, we ensure that the service is not only fast but also scalable, capable of handling multiple requests concurrently. This project can be further extended with more sophisticated models, improved feature engineering, or integration with a production environment.
Next Steps
To enhance the system further, consider the following:
- Model Improvements: Experiment with more advanced models like XGBoost or neural networks.
- Feature Engineering: Explore additional features that might improve model accuracy.
- Real-World Deployment: Deploy the application on cloud platforms like AWS or GCP for production use.
The above is the detailed content of Building a Real-Time Credit Card Fraud Detection System with FastAPI and Machine Learning. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.
