This blog post guides you through building a weather data analytics pipeline using the OpenWeatherMap API and AWS services. The pipeline fetches weather data, stores it in S3, catalogs it with AWS Glue, and allows querying with Amazon Athena.
Project Overview
This project creates a scalable data pipeline for fetching weather data from multiple cities, storing it in AWS S3, cataloging it via AWS Glue, and enabling querying using Amazon Athena.
Initial Architecture & Architecture Diagrams
Project Structure & Prerequisites
Before starting, ensure you have:
Setup Guide
Clone the Repository:
<code class="language-bash">git clone https://github.com/Rene-Mayhrem/weather-insights.git cd weather-data-analytics</code>
Create a .env
File: Create a .env
file in the root directory with your AWS credentials and API key:
<code>AWS_ACCESS_KEY_ID=<your-access-key-id> AWS_SECRET_ACCESS_KEY=<your-secret-access-key> AWS_REGION=us-east-1 S3_BUCKET_NAME=<your-s3-bucket-name> OPENWEATHER_API_KEY=<your-openweather-api-key></code>
Create cities.json
: Create cities.json
listing the cities:
<code class="language-json">{ "cities": [ "London", "New York", "Tokyo", "Paris", "Berlin" ] }</code>
Docker Compose: Build and run:
<code class="language-bash">docker compose run terraform init docker compose run python</code>
Usage
Verify Infrastructure: Check if Terraform created the AWS resources (S3, Glue database, Glue crawler) in the AWS console.
Verify Data Upload: Confirm the Python script uploaded weather data (JSON files) to your S3 bucket via the AWS console.
Run Glue Crawler: The Glue crawler should run automatically; verify its execution and data cataloging in the Glue console.
Query with Athena: Use the AWS Management Console to access Athena and run SQL queries on the cataloged data.
Key Components
Conclusion
This guide helps you build a scalable weather data analytics pipeline using AWS and OpenWeatherMap. The pipeline can be easily extended to include more cities or data sources.
The above is the detailed content of Building a Weather Data Analytics Pipeline with AWS and OpenWeatherMap API. For more information, please follow other related articles on the PHP Chinese website!