ClassiSage:基於 Terraform IaC 自動化 AWS SageMaker HDFS 日誌分類模型

Barbara Streisand
發布: 2024-10-26 05:04:30
原創
532 人瀏覽過

經典聖人

使用 AWS SageMaker 及其 Python SDK 製作的機器學習模型,用於使用 Terraform 實現基礎架構設定自動化的 HDFS 日誌分類。

連結:GitHub
語言:HCL(terraform)、Python

內容

  • 概述:項目概述。
  • 系統架構:系統架構圖
  • ML 模型:模型概論。
  • 入門:如何運行專案。
  • 控制台觀察:執行專案時可以觀察到的實例和基礎架構的變化。
  • 結束和清理:確保不產生額外費用。
  • 自動建立的物件:在執行過程中建立的檔案和資料夾。

  • 先遵循目錄結構以便更好地設定項目。
  • 從 GitHub 上傳的 ClassiSage 專案儲存庫中取得主要參考,以便更好地理解。

概述

  • 模型是使用 AWS SageMaker 進行 HDFS 日誌分類以及用於儲存資料集的 S3、Notebook 檔案(包含 SageMaker 實例的程式碼)和模型輸出。
  • 基礎設施設定是使用 Terraform 自動化的,Terraform 是一個由 HashiCorp 創建的提供基礎設施即程式碼的工具
  • 使用的資料集是HDFS_v1。
  • 專案使用模型 XGBoost 版本 1.2 實作 SageMaker Python SDK

系統架構

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

機器學習模型

  • 影像 URI
  # Looks for the XGBoost image URI and builds an XGBoost container. Specify the repo_version depending on preference.
  container = get_image_uri(boto3.Session().region_name,
                            'xgboost', 
                            repo_version='1.0-1')
登入後複製
登入後複製
登入後複製
登入後複製
登入後複製

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 初始化對容器的超參數和估計器呼叫
  hyperparameters = {
        "max_depth":"5",                ## Maximum depth of a tree. Higher means more complex models but risk of overfitting.
        "eta":"0.2",                    ## Learning rate. Lower values make the learning process slower but more precise.
        "gamma":"4",                    ## Minimum loss reduction required to make a further partition on a leaf node. Controls the model’s complexity.
        "min_child_weight":"6",         ## Minimum sum of instance weight (hessian) needed in a child. Higher values prevent overfitting.
        "subsample":"0.7",              ## Fraction of training data used. Reduces overfitting by sampling part of the data. 
        "objective":"binary:logistic",  ## Specifies the learning task and corresponding objective. binary:logistic is for binary classification.
        "num_round":50                  ## Number of boosting rounds, essentially how many times the model is trained.
        }
  # A SageMaker estimator that calls the xgboost-container
  estimator = sagemaker.estimator.Estimator(image_uri=container,                  # Points to the XGBoost container we previously set up. This tells SageMaker which algorithm container to use.
                                          hyperparameters=hyperparameters,      # Passes the defined hyperparameters to the estimator. These are the settings that guide the training process.
                                          role=sagemaker.get_execution_role(),  # Specifies the IAM role that SageMaker assumes during the training job. This role allows access to AWS resources like S3.
                                          train_instance_count=1,               # Sets the number of training instances. Here, it’s using a single instance.
                                          train_instance_type='ml.m5.large',    # Specifies the type of instance to use for training. ml.m5.2xlarge is a general-purpose instance with a balance of compute, memory, and network resources.
                                          train_volume_size=5, # 5GB            # Sets the size of the storage volume attached to the training instance, in GB. Here, it’s 5 GB.
                                          output_path=output_path,              # Defines where the model artifacts and output of the training job will be saved in S3.
                                          train_use_spot_instances=True,        # Utilizes spot instances for training, which can be significantly cheaper than on-demand instances. Spot instances are spare EC2 capacity offered at a lower price.
                                          train_max_run=300,                    # Specifies the maximum runtime for the training job in seconds. Here, it's 300 seconds (5 minutes).
                                          train_max_wait=600)                   # Sets the maximum time to wait for the job to complete, including the time waiting for spot instances, in seconds. Here, it's 600 seconds (10 minutes).
登入後複製
登入後複製
登入後複製
登入後複製

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 訓練工作
  estimator.fit({'train': s3_input_train,'validation': s3_input_test})
登入後複製
登入後複製
登入後複製
登入後複製

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 部署
  xgb_predictor = estimator.deploy(initial_instance_count=1,instance_type='ml.m5.large')
登入後複製
登入後複製
登入後複製

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 驗證
  # Looks for the XGBoost image URI and builds an XGBoost container. Specify the repo_version depending on preference.
  container = get_image_uri(boto3.Session().region_name,
                            'xgboost', 
                            repo_version='1.0-1')
登入後複製
登入後複製
登入後複製
登入後複製
登入後複製

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

入門

  • 使用 Git Bash 克隆儲存庫/下載 .zip 檔案/分叉儲存庫。
  • 前往您的 AWS 管理控制台,點擊右上角的帳戶設定文件,然後從下拉清單中選擇我的安全憑證。
  • 建立存取金鑰:在存取金鑰部分,按一下建立新存取金鑰,將出現一個對話框,其中包含您的存取金鑰 ID 和秘密存取金鑰。
  • 下載或複製金鑰:(重要)下載 .csv 檔案或將金鑰複製到安全位置。這是您唯一可以查看秘密存取金鑰的時間。
  • 開啟克隆的儲存庫。在你的 VS Code 中
  • 在ClassiSage下建立一個檔案為terraform.tfvars,其內容為
  hyperparameters = {
        "max_depth":"5",                ## Maximum depth of a tree. Higher means more complex models but risk of overfitting.
        "eta":"0.2",                    ## Learning rate. Lower values make the learning process slower but more precise.
        "gamma":"4",                    ## Minimum loss reduction required to make a further partition on a leaf node. Controls the model’s complexity.
        "min_child_weight":"6",         ## Minimum sum of instance weight (hessian) needed in a child. Higher values prevent overfitting.
        "subsample":"0.7",              ## Fraction of training data used. Reduces overfitting by sampling part of the data. 
        "objective":"binary:logistic",  ## Specifies the learning task and corresponding objective. binary:logistic is for binary classification.
        "num_round":50                  ## Number of boosting rounds, essentially how many times the model is trained.
        }
  # A SageMaker estimator that calls the xgboost-container
  estimator = sagemaker.estimator.Estimator(image_uri=container,                  # Points to the XGBoost container we previously set up. This tells SageMaker which algorithm container to use.
                                          hyperparameters=hyperparameters,      # Passes the defined hyperparameters to the estimator. These are the settings that guide the training process.
                                          role=sagemaker.get_execution_role(),  # Specifies the IAM role that SageMaker assumes during the training job. This role allows access to AWS resources like S3.
                                          train_instance_count=1,               # Sets the number of training instances. Here, it’s using a single instance.
                                          train_instance_type='ml.m5.large',    # Specifies the type of instance to use for training. ml.m5.2xlarge is a general-purpose instance with a balance of compute, memory, and network resources.
                                          train_volume_size=5, # 5GB            # Sets the size of the storage volume attached to the training instance, in GB. Here, it’s 5 GB.
                                          output_path=output_path,              # Defines where the model artifacts and output of the training job will be saved in S3.
                                          train_use_spot_instances=True,        # Utilizes spot instances for training, which can be significantly cheaper than on-demand instances. Spot instances are spare EC2 capacity offered at a lower price.
                                          train_max_run=300,                    # Specifies the maximum runtime for the training job in seconds. Here, it's 300 seconds (5 minutes).
                                          train_max_wait=600)                   # Sets the maximum time to wait for the job to complete, including the time waiting for spot instances, in seconds. Here, it's 600 seconds (10 minutes).
登入後複製
登入後複製
登入後複製
登入後複製
  • 下載並安裝使用 Terraform 和 Python 的所有相依性。
  • 在終端機中輸入/貼上 terraform init 來初始化後端。

  • 然後輸入/貼上 terraform Plan 以查看計劃或簡單地進行 terraform 驗證以確保沒有錯誤。

  • 最後在終端機中輸入/貼上 terraform apply --auto-approve

  • 這將顯示兩個輸出,一個作為bucket_name,另一個作為pretrained_ml_instance_name(第三個資源是賦予儲存桶的變數名稱,因為它們是全域資源)。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 終端機中顯示指令完成後,導覽至 ClassiSage/ml_ops/function.py 並在檔案的第 11 行新增程式碼
  estimator.fit({'train': s3_input_train,'validation': s3_input_test})
登入後複製
登入後複製
登入後複製
登入後複製

並將其變更為專案目錄所在的路徑並儲存。

  • 然後在 ClassiSageml_opsdata_upload.ipynb 上使用程式碼執行所有程式碼儲存格,直到儲存格編號 25
  xgb_predictor = estimator.deploy(initial_instance_count=1,instance_type='ml.m5.large')
登入後複製
登入後複製
登入後複製

將資料集上傳到 S3 Bucket。

  • 程式碼單元執行的輸出

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 執行筆記本後,重新開啟您的 AWS 管理主控台。
  • 您可以搜尋 S3 和 Sagemaker 服務,並將看到啟動的每個服務的實例(S3 儲存桶和 SageMaker Notebook)

名為「data-bucket-」的 S3 儲存桶,上傳了 2 個物件、一個資料集和包含模型程式碼的 pretrained_sm.ipynb 檔案。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model


  • 前往AWS SageMaker中的筆記本實例,按一下建立的實例,然後按一下開啟Jupyter。
  • 之後,點擊視窗右上角的「新建」並選擇「在終端機上」。
  • 這將建立一個新終端。

  • 在終端機上貼上以下內容(替換為 VS Code 終端機輸出中顯示的bucket_name 輸出):
  # Looks for the XGBoost image URI and builds an XGBoost container. Specify the repo_version depending on preference.
  container = get_image_uri(boto3.Session().region_name,
                            'xgboost', 
                            repo_version='1.0-1')
登入後複製
登入後複製
登入後複製
登入後複製
登入後複製

將 pretrained_sm.ipynb 從 S3 上傳到 Notebook 的 Jupyter 環境的終端命令

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model


  • 返回到開啟的 Jupyter 實例,然後按一下 pretrained_sm.ipynb 檔案將其開啟並為其指派 conda_python3 核心。
  • 向下捲動到第四個儲存格,並將變數bucket_name的值替換為VS Code的終端輸出bucket_name = ""
  hyperparameters = {
        "max_depth":"5",                ## Maximum depth of a tree. Higher means more complex models but risk of overfitting.
        "eta":"0.2",                    ## Learning rate. Lower values make the learning process slower but more precise.
        "gamma":"4",                    ## Minimum loss reduction required to make a further partition on a leaf node. Controls the model’s complexity.
        "min_child_weight":"6",         ## Minimum sum of instance weight (hessian) needed in a child. Higher values prevent overfitting.
        "subsample":"0.7",              ## Fraction of training data used. Reduces overfitting by sampling part of the data. 
        "objective":"binary:logistic",  ## Specifies the learning task and corresponding objective. binary:logistic is for binary classification.
        "num_round":50                  ## Number of boosting rounds, essentially how many times the model is trained.
        }
  # A SageMaker estimator that calls the xgboost-container
  estimator = sagemaker.estimator.Estimator(image_uri=container,                  # Points to the XGBoost container we previously set up. This tells SageMaker which algorithm container to use.
                                          hyperparameters=hyperparameters,      # Passes the defined hyperparameters to the estimator. These are the settings that guide the training process.
                                          role=sagemaker.get_execution_role(),  # Specifies the IAM role that SageMaker assumes during the training job. This role allows access to AWS resources like S3.
                                          train_instance_count=1,               # Sets the number of training instances. Here, it’s using a single instance.
                                          train_instance_type='ml.m5.large',    # Specifies the type of instance to use for training. ml.m5.2xlarge is a general-purpose instance with a balance of compute, memory, and network resources.
                                          train_volume_size=5, # 5GB            # Sets the size of the storage volume attached to the training instance, in GB. Here, it’s 5 GB.
                                          output_path=output_path,              # Defines where the model artifacts and output of the training job will be saved in S3.
                                          train_use_spot_instances=True,        # Utilizes spot instances for training, which can be significantly cheaper than on-demand instances. Spot instances are spare EC2 capacity offered at a lower price.
                                          train_max_run=300,                    # Specifies the maximum runtime for the training job in seconds. Here, it's 300 seconds (5 minutes).
                                          train_max_wait=600)                   # Sets the maximum time to wait for the job to complete, including the time waiting for spot instances, in seconds. Here, it's 600 seconds (10 minutes).
登入後複製
登入後複製
登入後複製
登入後複製

程式碼單元執行的輸出

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model


  • 在檔案頂部,轉到「核心」標籤來重新啟動。
  • 執行 Notebook 直到程式碼儲存格編號 27,使用程式碼
  estimator.fit({'train': s3_input_train,'validation': s3_input_test})
登入後複製
登入後複製
登入後複製
登入後複製
  • 您將得到預期的結果。 資料將被獲取,在針對具有定義的輸出路徑的標籤和功能進行調整後,分為訓練集和測試集,然後使用SageMaker 的Python SDK 的模型將被訓練、部署為端點、驗證以提供不同的指標。

控制台觀察筆記

執行第 8 個單元

  xgb_predictor = estimator.deploy(initial_instance_count=1,instance_type='ml.m5.large')
登入後複製
登入後複製
登入後複製
  • 將在S3中設定輸出路徑來儲存模型資料。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

執行第23個單元

  # Looks for the XGBoost image URI and builds an XGBoost container. Specify the repo_version depending on preference.
  container = get_image_uri(boto3.Session().region_name,
                            'xgboost', 
                            repo_version='1.0-1')
登入後複製
登入後複製
登入後複製
登入後複製
登入後複製
  • 訓練作業將會開始,您可以在訓練標籤下查看。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 一段時間後(預計3分鐘),它將完成並顯示相同的內容。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

執行第 24 個代碼單元

  hyperparameters = {
        "max_depth":"5",                ## Maximum depth of a tree. Higher means more complex models but risk of overfitting.
        "eta":"0.2",                    ## Learning rate. Lower values make the learning process slower but more precise.
        "gamma":"4",                    ## Minimum loss reduction required to make a further partition on a leaf node. Controls the model’s complexity.
        "min_child_weight":"6",         ## Minimum sum of instance weight (hessian) needed in a child. Higher values prevent overfitting.
        "subsample":"0.7",              ## Fraction of training data used. Reduces overfitting by sampling part of the data. 
        "objective":"binary:logistic",  ## Specifies the learning task and corresponding objective. binary:logistic is for binary classification.
        "num_round":50                  ## Number of boosting rounds, essentially how many times the model is trained.
        }
  # A SageMaker estimator that calls the xgboost-container
  estimator = sagemaker.estimator.Estimator(image_uri=container,                  # Points to the XGBoost container we previously set up. This tells SageMaker which algorithm container to use.
                                          hyperparameters=hyperparameters,      # Passes the defined hyperparameters to the estimator. These are the settings that guide the training process.
                                          role=sagemaker.get_execution_role(),  # Specifies the IAM role that SageMaker assumes during the training job. This role allows access to AWS resources like S3.
                                          train_instance_count=1,               # Sets the number of training instances. Here, it’s using a single instance.
                                          train_instance_type='ml.m5.large',    # Specifies the type of instance to use for training. ml.m5.2xlarge is a general-purpose instance with a balance of compute, memory, and network resources.
                                          train_volume_size=5, # 5GB            # Sets the size of the storage volume attached to the training instance, in GB. Here, it’s 5 GB.
                                          output_path=output_path,              # Defines where the model artifacts and output of the training job will be saved in S3.
                                          train_use_spot_instances=True,        # Utilizes spot instances for training, which can be significantly cheaper than on-demand instances. Spot instances are spare EC2 capacity offered at a lower price.
                                          train_max_run=300,                    # Specifies the maximum runtime for the training job in seconds. Here, it's 300 seconds (5 minutes).
                                          train_max_wait=600)                   # Sets the maximum time to wait for the job to complete, including the time waiting for spot instances, in seconds. Here, it's 600 seconds (10 minutes).
登入後複製
登入後複製
登入後複製
登入後複製
  • 端點將部署在推理標籤下。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

額外的控制台觀察:

  • 在「推理」標籤下建立端點配置。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 也在「推理」標籤下建立模型。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model


結束和清理

  • 在 VS Code 中傳回 data_upload.ipynb 執行最後 2 個代碼單元,將 S3 儲存桶的資料下載到本機系統。
  • 該資料夾將命名為downloaded_bucket_content。 已下載資料夾的目錄結構。

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 您將在輸出儲存格中獲得下載檔案的日誌。它將包含原始 pretrained_sm.ipynb、final_dataset.csv 和名為「pretrained-algo」的模型輸出資料夾,其中包含 sagemaker 程式碼檔案的執行資料。
  • 最後進入 SageMaker 實例內的 pretrained_sm.ipynb 並執行最後 2 個程式碼單元。 端點和S3儲存桶內的資源將被刪除,以確保不會產生額外費用。
  • 刪除端點
  estimator.fit({'train': s3_input_train,'validation': s3_input_test})
登入後複製
登入後複製
登入後複製
登入後複製

ClassiSage: Terraform IaC Automated AWS SageMaker based HDFS Log classification Model

  • 清除S3:(需要銷毀實例)
  # Looks for the XGBoost image URI and builds an XGBoost container. Specify the repo_version depending on preference.
  container = get_image_uri(boto3.Session().region_name,
                            'xgboost', 
                            repo_version='1.0-1')
登入後複製
登入後複製
登入後複製
登入後複製
登入後複製
  • 返回專案文件的 VS Code 終端,然後輸入/貼上 terraform destroy --auto-approve
  • 所有建立的資源實例將會被刪除。

自動建立的對象

ClassiSage/downloaded_bucket_content
ClassiSage/.terraform
ClassiSage/ml_ops/pycache
ClassiSage/.terraform.lock.hcl
ClassiSage/terraform.tfstate
ClassiSage/terraform.tfstate.backup

注意:
如果您喜歡這個機器學習專案的想法和實現,該專案使用AWS Cloud 的S3 和SageMaker 進行HDFS 日誌分類,使用Terraform 進行IaC(基礎設施設定自動化),請在查看GitHub 上的專案儲存庫後考慮喜歡這篇文章並加星號.

以上是ClassiSage:基於 Terraform IaC 自動化 AWS SageMaker HDFS 日誌分類模型的詳細內容。更多資訊請關注PHP中文網其他相關文章!

來源:dev.to
本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
作者最新文章
熱門教學
更多>
最新下載
更多>
網站特效
網站源碼
網站素材
前端模板