Workflow local : orchestrer l'ingestion de données dans Airtable

Barbara Streisand
Libérer: 2024-11-14 13:42:02
original
696 Les gens l'ont consulté

Local Workflow: Orchestrating Data Ingestion into Airtable

pengenalan

Kitaran hayat data keseluruhan bermula dengan menjana data dan menyimpannya dalam beberapa cara, di suatu tempat. Mari kita panggil ini sebagai kitaran hayat data peringkat awal dan kami akan meneroka cara untuk mengautomasikan pengingesan data ke dalam Airtable menggunakan aliran kerja setempat. Kami akan meliputi penyediaan persekitaran pembangunan, mereka bentuk proses pengingesan, mencipta skrip kelompok dan menjadualkan aliran kerja - memastikan perkara mudah, setempat/boleh dihasilkan semula dan boleh diakses.
Mula-mula, mari bercakap tentang Airtable. Airtable ialah alat yang berkuasa dan fleksibel yang menggabungkan kesederhanaan hamparan dengan struktur pangkalan data. Saya rasa ia sesuai untuk mengatur maklumat, mengurus projek, menjejaki tugas dan ia mempunyai peringkat percuma!

Menyediakan Alam Sekitar

Menyediakan persekitaran pembangunan

Kami akan membangunkan projek ini dengan python, jadi makan tengahari IDE kegemaran anda dan cipta persekitaran maya

# from your terminal
python -m venv <environment_name>
<environment_name>\Scripts\activate
Copier après la connexion

Untuk bermula dengan Airtable, pergi ke tapak web Airtable. Sebaik sahaja anda telah mendaftar untuk akaun percuma, anda perlu membuat Ruang Kerja baharu. Fikirkan Ruang Kerja sebagai bekas untuk semua jadual dan data anda yang berkaitan.

Seterusnya, buat Jadual baharu dalam Ruang Kerja anda. Jadual pada asasnya ialah hamparan tempat anda akan menyimpan data anda. Tentukan Medan (lajur) dalam Jadual anda untuk memadankan struktur data anda.

Berikut ialah coretan medan yang digunakan dalam tutorial, ia adalah gabungan Teks, Tarikh dan Nombor:

Local Workflow: Orchestrating Data Ingestion into Airtable

Untuk menyambungkan skrip anda ke Airtable, anda perlu menjana Kunci API atau Token Akses Peribadi. Kunci ini bertindak sebagai kata laluan, membenarkan skrip anda berinteraksi dengan data Airtable anda. Untuk menjana kunci, navigasi ke tetapan akaun Airtable anda, cari bahagian API dan ikut arahan untuk membuat kunci baharu.

*Ingat untuk memastikan kunci API anda selamat. Elakkan berkongsinya secara terbuka atau menyerahkannya ke repositori awam. *

Memasang kebergantungan yang diperlukan (Python, perpustakaan, dll.)

Seterusnya, sentuh keperluan.txt. Di dalam fail .txt ini letakkan pakej berikut:

pyairtable
schedule
faker
python-dotenv
Copier après la connexion

kini jalankan pip install -r requirements.txt untuk memasang pakej yang diperlukan.

Mengatur struktur projek

Langkah ini ialah di mana kami mencipta skrip, .env ialah tempat kami akan menyimpan bukti kelayakan kami, autoRecords.py - untuk menjana data secara rawak untuk medan yang ditentukan dan ingestData.py untuk memasukkan rekod ke Airtable.

Designing the Ingestion Process: Environment Variables

Local Workflow: Orchestrating Data Ingestion into Airtable

"https://airtable.com/app########/tbl######/viw####?blocks=show"
BASE_ID = 'app########'
TABLE_NAME = 'tbl######'
API_KEY = '#######'
Copier après la connexion

Designing the Ingestion Process: Automated Records

Sounds good, let's put together a focused subtopic content for your blog post on this employee data generator.

Generating Realistic Employee Data for Your Projects

When working on projects that involve employee data, it's often helpful to have a reliable way to generate realistic sample data. Whether you're building an HR management system, an employee directory, or anything in between, having access to robust test data can streamline your development and make your application more resilient.

In this section, we'll explore a Python script that generates random employee records with a variety of relevant fields. This tool can be a valuable asset when you need to populate your application with realistic data quickly and easily.

Generating Unique IDs

The first step in our data generation process is to create unique identifiers for each employee record. This is an important consideration, as your application will likely need a way to uniquely reference each individual employee. Our script includes a simple function to generate these IDs:

def generate_unique_id():
    """Generate a Unique ID in the format N-#####"""
    return f"N-{random.randint(10000, 99999)}"
Copier après la connexion

This function generates a unique ID in the format "N-#####", where the number is a random 5-digit value. You can customize this format to suit your specific needs.

Generating Random Employee Records

Next, let's look at the core function that generates the employee records themselves. The generate_random_records() function takes the number of records to create as input and returns a list of dictionaries, where each dictionary represents an employee with various fields:

def generate_random_records(num_records=10):
    """
    Generate random records with reasonable constraints
    :param num_records: Number of records to generate
    :return: List of records formatted for Airtable
    """
    records = []

    # Constants
    departments = ['Sales', 'Engineering', 'Marketing', 'HR', 'Finance', 'Operations']
    statuses = ['Active', 'On Leave', 'Contract', 'Remote']

    for _ in range(num_records):
        # Generate date in the correct format
        random_date = datetime.now() - timedelta(days=random.randint(0, 365))
        formatted_date = random_date.strftime('%Y-%m-%dT%H:%M:%S.000Z')

        record = {
            'fields': {
                'ID': generate_unique_id(),
                'Name': fake.name(),
                'Age': random.randint(18, 80),
                'Email': fake.email(),
                'Department': random.choice(departments),
                'Salary': round(random.uniform(30000, 150000), 2),
                'Phone': fake.phone_number(),
                'Address': fake.address().replace('\n', '\\n'),  # Escape newlines
                'Date Added': formatted_date,
                'Status': random.choice(statuses),
                'Years of Experience': random.randint(0, 45)
            }
        }
        records.append(record)

    return records
Copier après la connexion

This function uses the Faker library to generate realistic-looking data for various employee fields, such as name, email, phone number, and address. It also includes some basic constraints, such as limiting the age range and salary range to reasonable values.

The function returns a list of dictionaries, where each dictionary represents an employee record in a format that is compatible with Airtable.

Preparing Data for Airtable

Finally, let's look at the prepare_records_for_airtable() function, which takes the list of employee records and extracts the 'fields' portion of each record. This is the format that Airtable expects for importing data:

def prepare_records_for_airtable(records):
    """Convert records from nested format to flat format for Airtable"""
    return [record['fields'] for record in records]
Copier après la connexion

This function simplifies the data structure, making it easier to work with when integrating the generated data with Airtable or other systems.

Putting It All Together

To use this data generation tool, we can call the generate_random_records() function with the desired number of records, and then pass the resulting list to the prepare_records_for_airtable() function:

if __name__ == "__main__":
    records = generate_random_records(2)
    print(records)
    prepared_records = prepare_records_for_airtable(records)
    print(prepared_records)
Copier après la connexion

This will generate 2 random employee records, print them in their original format, and then print the records in the flat format suitable for Airtable.

Run:

python autoRecords.py
Copier après la connexion

Output:

## Raw Data
[{'fields': {'ID': 'N-11247', 'Name': 'Christine Cummings', 'Age': 22, 'Email': 'perezbill@example.net', 'Department': 'Finance', 'Salary': 149928.17, 'Phone': '(999)961-2703', 'Ad
dress': 'USNV Wheeler\\nFPO AP 08774', 'Date Added': '2024-11-06T15:50:39.000Z', 'Status': 'On Leave', 'Years of Experience': 8}}, {'fields': {'ID': 'N-48578', 'Name': 'Stephanie O
wen', 'Age': 41, 'Email': 'nicholasharris@example.net', 'Department': 'Engineering', 'Salary': 56206.04, 'Phone': '981-354-1421', 'Address': '872 Shelby Neck Suite 854\\nSeanbury, IL 24904', 'Date Added': '2024-10-15T15:50:39.000Z', 'Status': 'Active', 'Years of Experience': 25}}]

## Tidied Up Data
[{'ID': 'N-11247', 'Name': 'Christine Cummings', 'Age': 22, 'Email': 'perezbill@example.net', 'Department': 'Finance', 'Salary': 149928.17, 'Phone': '(999)961-2703', 'Address': 'US
NV Wheeler\\nFPO AP 08774', 'Date Added': '2024-11-06T15:50:39.000Z', 'Status': 'On Leave', 'Years of Experience': 8}, {'ID': 'N-48578', 'Name': 'Stephanie Owen', 'Age': 41, 'Email': 'nicholasharris@example.net', 'Department': 'Engineering', 'Salary': 56206.04, 'Phone': '981-354-1421', 'Address': '872 Shelby Neck Suite 854\\nSeanbury, IL 24904', 'Date Added': '2024-10-15T15:50:39.000Z', 'Status': 'Active', 'Years of Experience': 25}]
Copier après la connexion

Integrating Generated Data with Airtable

In addition to generating realistic employee data, our script also provides functionality to seamlessly integrate that data with Airtable

Setting up the Airtable Connection

Before we can start inserting our generated data into Airtable, we need to establish a connection to the platform. Our script uses the pyairtable library to interact with the Airtable API. We start by loading the necessary environment variables, including the Airtable API key and the Base ID and Table Name where we want to store the data:

import os
from dotenv import load_dotenv
from pyairtable import Api
import logging

# Load environment vars
load_dotenv()

# Credentials
API_KEY = os.getenv("API_KEY")
BASE_ID = os.getenv("BASE_ID")
TABLE_NAME = os.getenv("TABLE_NAME")
Copier après la connexion

With these credentials, we can then initialize the Airtable API client and get a reference to the specific table we want to work with:

def main():
    # Initiate Connection
    api = Api(API_KEY)
    table = api.table(BASE_ID, TABLE_NAME)
Copier après la connexion
Inserting the Generated Data

Now that we have the connection set up, we can use the generate_random_records() function from the previous section to create a batch of employee records, and then insert them into Airtable:

def main():
    # ... (connection setup)

    num_records = 50

    try:
        # Generate and prep. data
        auto_records = generate_random_records(num_records)
        prepd_records = prep_for_insertion(auto_records)

        # Insert Data
        print("inserting records... ")
        created_records = table.batch_create(prepd_records)
        print(f"Successfully inserted {len(created_records)} records")

    except Exception as e:
        logger.error(f"An error occurred: {str(e)}")
        raise
Copier après la connexion

The prep_for_insertion() function is responsible for converting the nested record format returned by generate_random_records() into the flat format expected by the Airtable API. Once the data is prepared, we use the table.batch_create() method to insert the records in a single bulk operation.

Error Handling and Logging

To ensure our integration process is robust and easy to debug, we've also included some basic error handling and logging functionality. If any errors occur during the data insertion process, the script will log the error message to help with troubleshooting:

import logging

# Set Up logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

def main():
    # ... (connection setup and data insertion)

    try:
        # Insert Data
        # ...
    except Exception as e:
        logger.error(f"An error occurred: {str(e)}")
        raise
Copier après la connexion

By combining the powerful data generation capabilities of our earlier script with the integration features shown here, you can quickly and reliably populate your Airtable-based applications with realistic employee data.

Scheduling Automated Data Ingestion with a Batch Script

To make the data ingestion process fully automated, we can create a batch script (.bat file) that will run the Python script on a regular schedule. This allows you to set up the data ingestion to happen automatically without manual intervention.

Here's an example of a batch script that can be used to run the ingestData.py script:

@echo off
echo Starting Airtable Automated Data Ingestion Service...

:: Project directory
cd /d <absolute project directory>

:: Activate virtual environment
call <absolute project directory>\<virtual environment>\Scripts\activate.bat

:: Run python script
python ingestData.py

:: Keep the window open if there's no error
if %ERRORLEVEL% NEQ 0 (
    echo An error ocurred! Error code: %ERRORLEVEL%
    pause
)
Copier après la connexion

Let's break down the key parts of this script:

  1. @echo off: 이 줄은 각 명령이 콘솔에 인쇄되는 것을 억제하여 출력을 더 깔끔하게 만듭니다.
  2. echo Starting Airtable Automated Data Ingestion Service...: 이 줄은 스크립트가 시작되었음을 나타내는 메시지를 콘솔에 인쇄합니다.
  3. cd /d C:UsersbuascPycharmProjectsscrapEngineering: 이 줄은 현재 작업 디렉터리를 ingestData.py 스크립트가 있는 프로젝트 디렉터리로 변경합니다.
  4. C:UsersbuascPycharmProjectsscrapEngineeringvenv_airtableScriptsactivate.bat 호출: 이 줄은 필요한 Python 종속성이 설치된 가상 환경을 활성화합니다.
  5. python ingestData.py: 이 줄은 ingestData.py Python 스크립트를 실행합니다.
  6. if %ERRORLEVEL% NEQ 0 (... ): 이 블록은 Python 스크립트에서 오류가 발생했는지 확인합니다(즉, ERRORLEVEL이 0이 아닌 경우). 오류가 발생하면 오류 메시지를 인쇄하고 스크립트를 일시 중지하여 문제를 조사할 수 있습니다.

이 배치 스크립트가 자동으로 실행되도록 예약하려면 Windows 작업 스케줄러를 사용할 수 있습니다. 단계에 대한 간략한 개요는 다음과 같습니다.

  1. 시작 메뉴를 열고 "작업 스케줄러"를 검색하세요. 또는 Windows R 및 Local Workflow: Orchestrating Data Ingestion into Airtable
  2. 작업 스케줄러에서 새 작업을 만들고 설명이 포함된 이름(예: "Airtable Data Ingestion")을 지정합니다.
  3. '작업' 탭에서 새 작업을 추가하고 배치 스크립트 경로를 지정합니다(예: C:UsersbuascPycharmProjectsscrapEngineeringestData.bat).
  4. 매일, 매주, 매월 등 스크립트를 실행할 일정을 구성하세요.
  5. 작업을 저장하고 활성화하세요.

Local Workflow: Orchestrating Data Ingestion into Airtable

이제 Windows 작업 스케줄러는 지정된 간격으로 배치 스크립트를 자동으로 실행하여 수동 개입 없이 Airtable 데이터가 정기적으로 업데이트되도록 합니다.

결론

이는 테스트, 개발은 물론 데모 목적으로도 매우 유용한 도구가 될 수 있습니다.

이 가이드 전체에서는 필요한 개발 환경을 설정하고, 수집 프로세스를 설계하고, 작업을 자동화하기 위한 배치 스크립트를 만들고, 무인 실행을 위한 워크플로를 예약하는 방법을 배웠습니다. 이제 우리는 로컬 자동화의 힘을 활용하여 데이터 수집 작업을 간소화하고 Airtable 기반 데이터 생태계

에서 귀중한 통찰력을 얻는 방법을 확실히 이해했습니다.

이제 자동화된 데이터 수집 프로세스를 설정했으므로 이 기반을 구축하고 Airtable 데이터에서 더 많은 가치를 창출할 수 있는 다양한 방법이 있습니다. 코드를 실험하고, 새로운 사용 사례를 탐색하고, 경험을 커뮤니티와 공유해 보시기 바랍니다.

Voici quelques idées pour vous aider à démarrer :

  • Personnaliser la génération de données
  • Exploiter les données ingérées [analyse exploratoire des données basée sur Markdown (EDA), créer des tableaux de bord ou des visualisations interactifs à l'aide d'outils tels que Tableau, Power BI ou Plotly, expérimenter des flux de travail d'apprentissage automatique (prédire le roulement du personnel ou identifier les plus performants)]
  • Intégrer avec d'autres systèmes [fonctions cloud, webhooks ou entrepôts de données]

Les possibilités sont infinies ! Je suis ravi de voir comment vous exploitez ce processus automatisé d'ingestion de données et débloquez de nouvelles informations et de la valeur à partir de vos données Airtable. N'hésitez pas à expérimenter, collaborer et partager vos progrès. Je suis là pour vous soutenir tout au long du chemin.

Voir le code complet https://github.com/AkanimohOD19A/scheduling_airtable_insertion, un didacticiel vidéo complet est en route.

Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!

source:dev.to
Déclaration de ce site Web
Le contenu de cet article est volontairement contribué par les internautes et les droits d'auteur appartiennent à l'auteur original. Ce site n'assume aucune responsabilité légale correspondante. Si vous trouvez un contenu suspecté de plagiat ou de contrefaçon, veuillez contacter admin@php.cn
Derniers articles par auteur
Tutoriels populaires
Plus>
Derniers téléchargements
Plus>
effets Web
Code source du site Web
Matériel du site Web
Modèle frontal