scikit-learn을 사용하여 PyTorch 모델에 대한 하이퍼파라미터 그리드 검색을 수행하는 방법은 무엇입니까?-일체 포함-php.cn

使用scikit-learn为PyTorch 模型进行超参数网格搜索

scikit-learn은 Python에서 최고의 기계 학습 라이브러리이며, PyTorch는 모델 구축을 위한 편리한 작업을 제공합니다. 이들의 장점을 통합할 수 있나요? 이 글에서는 scikit-learn에서 그리드 검색 기능을 사용하여 PyTorch 딥 러닝 모델의 하이퍼파라미터를 조정하는 방법을 다룹니다.

scikit-learn과 함께 사용할 PyTorch 모델을 래핑하는 방법 및 그리드 검색을 사용하는 방법
그리드가 학습률, 드롭아웃, 에포크, 뉴런 수와 같은 일반적인 신경망 매개변수를 검색하는 방법
자신의 프로젝트에서 하이퍼매개변수 조정 실험을 정의하세요.

scikit-learn에서 PyTorch 모델을 사용하는 방법

To One scikit-learn에서 PyTorch 모델을 사용할 수 있게 만드는 가장 쉬운 방법 중 하나는 skorch 패키지를 사용하는 것입니다. 이 패키지는 PyTorch 모델을 위한 scikit-learn 호환 API를 제공합니다. skorch에는 분류 신경망을 위한 NeuralNetClassifier와 회귀 신경망을 위한 NeuralNetRegressor가 있습니다.

1	`pip install skorch`

로그인 후 복사

이 래퍼를 사용하려면 nn.Module을 사용하여 PyTorch 모델을 클래스로 정의한 다음 NeuralNetClassifier 클래스를 생성할 때 클래스 이름을 모듈 매개 변수에 전달해야 합니다. 예:

class MyClassifier(nn.Module):
def __init__(self):
super().__init__()
...
  
def forward(self, x):
...
return x
  
 # create the skorch wrapper
 model = NeuralNetClassifier(
module=MyClassifier
 )

로그인 후 복사

NeuralNetClassifier 클래스의 생성자는 세대 수 및 배치 크기와 같은 model.fit() 호출(scikit-learn 모델에서 훈련 루프를 호출하는 방법)에 전달된 매개변수를 가져올 수 있습니다. 예:

model = NeuralNetClassifier(
module=MyClassifier,
max_epochs=150,
batch_size=10
 )

로그인 후 복사

NeuralNetClassifier 클래스의 생성자는 새 매개변수를 허용할 수도 있습니다. 이러한 매개변수는 모델 클래스의 생성자에 전달될 수 있습니다. 요구 사항은 module__(두 개의 밑줄)을 앞에 추가해야 한다는 것입니다. 이러한 새 매개변수는 생성자에서 기본값을 가질 수 있지만 래퍼가 모델을 인스턴스화할 때 재정의됩니다. 예:

import torch.nn as nn
 from skorch import NeuralNetClassifier
  
 class SonarClassifier(nn.Module):
def __init__(self, n_layers=3):
super().__init__()
self.layers = []
self.acts = []
for i in range(n_layers):
self.layers.append(nn.Linear(60, 60))
self.acts.append(nn.ReLU())
self.add_module(f"layer{i}", self.layers[-1])
self.add_module(f"act{i}", self.acts[-1])
self.output = nn.Linear(60, 1)
  
def forward(self, x):
for layer, act in zip(self.layers, self.acts):
x = act(layer(x))
x = self.output(x)
return x
  
 model = NeuralNetClassifier(
module=SonarClassifier,
max_epochs=150,
batch_size=10,
module__n_layers=2
 )

로그인 후 복사

모델을 초기화하고 인쇄하여 결과를 확인할 수 있습니다.

print(model.initialize())
  
 #结果如下：
 <class 'skorch.classifier.NeuralNetClassifier'>[initialized](
module_=SonarClassifier(
(layer0): Linear(in_features=60, out_features=60, bias=True)
(act0): ReLU()
(layer1): Linear(in_features=60, out_features=60, bias=True)
(act1): ReLU()
(output): Linear(in_features=60, out_features=1, bias=True)
),
 )

로그인 후 복사

scikit-learn에서 그리드 검색 사용

그리드 검색은 모델 하이퍼 매개변수 최적화 기술입니다. 단순히 하이퍼파라미터의 모든 조합을 소진하고 가장 좋은 점수를 제공하는 조합을 찾습니다. scikit-learn에서는 GridSearchCV 클래스가 이 기술을 제공합니다. 이 클래스를 생성할 때 param_grid 매개변수에 하이퍼 매개변수 사전을 제공해야 합니다. 시도해 볼 모델 매개변수 이름과 값 배열의 맵입니다.

Precision은 기본적으로 최적화 점수로 사용되지만 GridSearchCV 생성자의 점수 매개 변수에 다른 점수를 지정할 수 있습니다. GridSearchCV는 평가할 각 매개변수 조합에 대한 모델을 구축합니다. 그리고 매개변수를 통해 설정할 수 있는 기본 3겹 교차 검증을 사용하세요.

다음은 간단한 그리드 검색을 정의하는 예입니다.

param_grid = {
'epochs': [10,20,30]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, Y)

로그인 후 복사

GridSearchCV 생성자의 n_jobs 매개변수를 -1로 설정하면 머신의 모든 코어가 사용된다는 의미입니다. 그렇지 않으면 그리드 검색 프로세스가 단일 스레드에서만 실행되므로 멀티 코어 CPU에서는 속도가 느려집니다.

실행 후, Grid.fit()이 반환한 결과 개체에서 그리드 검색 결과에 액세스할 수 있습니다. best_score는 최적화 중에 관찰된 최고 점수를 제공하고 best_params_는 최상의 결과를 달성한 매개변수의 조합을 설명합니다.

예제 문제 설명

우리의 예는 모두 당뇨병 발병 분류 데이터 세트인 소규모 표준 기계 학습 데이터 세트에서 시연됩니다. 이는 작은 데이터 세트이며 모든 숫자 속성은 쉽게 처리할 수 있습니다.

배치 크기와 학습 에포크 수를 조정하는 방법

첫 번째 간단한 예에서는 네트워크 피팅 시 사용되는 배치 크기와 에포크 수를 조정하는 방법을 소개합니다.

우리는 단순히 10에서 100까지 다양한 배치 크기를 평가할 것입니다. 코드 목록은 다음과 같습니다.

import random
 import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
  
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
  
 # PyTorch classifier
 class PimaClassifier(nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
  
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
  
 # create model with skorch
 model = NeuralNetClassifier(
PimaClassifier,
criterinotallow=nn.BCELoss,
optimizer=optim.Adam,
verbose=False
 )
  
 # define the grid search parameters
 param_grid = {
'batch_size': [10, 20, 40, 60, 80, 100],
'max_epochs': [10, 50, 100]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
  
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

로그인 후 복사

결과는 다음과 같습니다.

Best: 0.714844 using {'batch_size': 10, 'max_epochs': 100}
 0.665365 (0.020505) with: {'batch_size': 10, 'max_epochs': 10}
 0.588542 (0.168055) with: {'batch_size': 10, 'max_epochs': 50}
 0.714844 (0.032369) with: {'batch_size': 10, 'max_epochs': 100}
 0.671875 (0.022326) with: {'batch_size': 20, 'max_epochs': 10}
 0.696615 (0.008027) with: {'batch_size': 20, 'max_epochs': 50}
 0.714844 (0.019918) with: {'batch_size': 20, 'max_epochs': 100}
 0.666667 (0.009744) with: {'batch_size': 40, 'max_epochs': 10}
 0.687500 (0.033603) with: {'batch_size': 40, 'max_epochs': 50}
 0.707031 (0.024910) with: {'batch_size': 40, 'max_epochs': 100}
 0.667969 (0.014616) with: {'batch_size': 60, 'max_epochs': 10}
 0.694010 (0.036966) with: {'batch_size': 60, 'max_epochs': 50}
 0.694010 (0.042473) with: {'batch_size': 60, 'max_epochs': 100}
 0.670573 (0.023939) with: {'batch_size': 80, 'max_epochs': 10}
 0.674479 (0.020752) with: {'batch_size': 80, 'max_epochs': 50}
 0.703125 (0.026107) with: {'batch_size': 80, 'max_epochs': 100}
 0.680990 (0.014382) with: {'batch_size': 100, 'max_epochs': 10}
 0.670573 (0.013279) with: {'batch_size': 100, 'max_epochs': 50}
 0.687500 (0.017758) with: {'batch_size': 100, 'max_epochs': 100}

로그인 후 복사

'batch_size': 10, 'max_epochs': 100이 대략 달성되는 것을 볼 수 있습니다. 71% 정밀도에 대한 최상의 결과.

트레이닝 옵티마이저 조정 방법

옵티마이저 조정 방법을 살펴보겠습니다. SDG, Adam 등 선택할 수 있는 옵티마이저가 많은데 어떻게 선택해야 할까요?

전체 코드는 다음과 같습니다.

import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
  
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
  
 # PyTorch classifier
 class PimaClassifier(nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
  
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
  
 # create model with skorch
 model = NeuralNetClassifier(
PimaClassifier,
criterinotallow=nn.BCELoss,
max_epochs=100,
batch_size=10,
verbose=False
 )
  
 # define the grid search parameters
 param_grid = {
'optimizer': [optim.SGD, optim.RMSprop, optim.Adagrad, optim.Adadelta,
optim.Adam, optim.Adamax, optim.NAdam],
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
  
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

로그인 후 복사

출력은 다음과 같습니다.

Best: 0.721354 using {'optimizer': <class 'torch.optim.adamax.Adamax'>}
 0.674479 (0.036828) with: {'optimizer': <class 'torch.optim.sgd.SGD'>}
 0.700521 (0.043303) with: {'optimizer': <class 'torch.optim.rmsprop.RMSprop'>}
 0.682292 (0.027126) with: {'optimizer': <class 'torch.optim.adagrad.Adagrad'>}
 0.572917 (0.051560) with: {'optimizer': <class 'torch.optim.adadelta.Adadelta'>}
 0.714844 (0.030758) with: {'optimizer': <class 'torch.optim.adam.Adam'>}
 0.721354 (0.019225) with: {'optimizer': <class 'torch.optim.adamax.Adamax'>}
 0.709635 (0.024360) with: {'optimizer': <class 'torch.optim.nadam.NAdam'>}

로그인 후 복사

Adamax 최적화 알고리즘이 약 72%의 정확도로 우리 모델과 데이터 세트에 가장 적합하다는 것을 알 수 있습니다.

학습률 조정 방법

pytorch의 학습률 계획을 사용하면 라운드에 따라 학습률을 동적으로 조정할 수 있지만, 예를 들어 학습률과 학습률의 매개변수를 그리드 검색. PyTorch에서 학습률과 모멘텀을 설정하는 방법은 다음과 같습니다.

1	`optimizer = optim.SGD(lr=0.001, momentum=0.9)`

로그인 후 복사

skorch 패키지에서는 접두어 최적화 도구__를 사용하여 매개변수를 최적화 도구로 라우팅합니다.

import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
  
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
  
 # PyTorch classifier
 class PimaClassifier(nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
  
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
  
 # create model with skorch
 model = NeuralNetClassifier(
PimaClassifier,
criterinotallow=nn.BCELoss,
optimizer=optim.SGD,
max_epochs=100,
batch_size=10,
verbose=False
 )
  
 # define the grid search parameters
 param_grid = {
'optimizer__lr': [0.001, 0.01, 0.1, 0.2, 0.3],
'optimizer__momentum': [0.0, 0.2, 0.4, 0.6, 0.8, 0.9],
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
  
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

로그인 후 복사

결과는 다음과 같습니다.

Best: 0.682292 using {'optimizer__lr': 0.001, 'optimizer__momentum': 0.9}
 0.648438 (0.016877) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.0}
 0.671875 (0.017758) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.2}
 0.674479 (0.022402) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.4}
 0.677083 (0.011201) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.6}
 0.679688 (0.027621) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.8}
 0.682292 (0.026557) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.9}
 0.671875 (0.019918) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.0}
 0.648438 (0.024910) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.2}
 0.546875 (0.143454) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.4}
 0.567708 (0.153668) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.6}
 0.552083 (0.141790) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.8}
 0.451823 (0.144561) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.9}
 0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.0}
 0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.2}
 0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.4}
 0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.6}
 0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.8}
 0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.9}
 0.444010 (0.136265) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.0}
 0.450521 (0.142719) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.2}
 0.348958 (0.001841) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.4}
 0.552083 (0.141790) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.6}
 0.549479 (0.142719) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.8}
 0.651042 (0.001841) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.9}
 0.552083 (0.141790) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.0}
 0.348958 (0.001841) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.2}
 0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.4}
 0.552083 (0.141790) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.6}
 0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.8}
 0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.9}

로그인 후 복사

SGD의 경우 학습률 0.001, 운동량 0.9를 사용하여 약 68%의 정확도로 가장 좋은 결과를 얻었습니다.

함수 활성화 방법

활성화 함수는 단일 뉴런의 비선형성을 제어합니다. PyTorch에서 사용할 수 있는 활성화 함수 중 일부를 평가하는 방법을 보여 드리겠습니다.

import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.init as init
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
  
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
  
 # PyTorch classifier
 class PimaClassifier(nn.Module):
def __init__(self, activatinotallow=nn.ReLU):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = activation()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
# manually init weights
init.kaiming_uniform_(self.layer.weight)
init.kaiming_uniform_(self.output.weight)
  
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
  
 # create model with skorch
 model = NeuralNetClassifier(
PimaClassifier,
criterinotallow=nn.BCELoss,
optimizer=optim.Adamax,
max_epochs=100,
batch_size=10,
verbose=False
 )
  
 # define the grid search parameters
 param_grid = {
'module__activation': [nn.Identity, nn.ReLU, nn.ELU, nn.ReLU6,
nn.GELU, nn.Softplus, nn.Softsign, nn.Tanh,
nn.Sigmoid, nn.Hardsigmoid]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
  
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

로그인 후 복사

결과는 다음과 같습니다.

Best: 0.699219 using {'module__activation': <class 'torch.nn.modules.activation.ReLU'>}
 0.687500 (0.025315) with: {'module__activation': <class 'torch.nn.modules.linear.Identity'>}
 0.699219 (0.011049) with: {'module__activation': <class 'torch.nn.modules.activation.ReLU'>}
 0.674479 (0.035849) with: {'module__activation': <class 'torch.nn.modules.activation.ELU'>}
 0.621094 (0.063549) with: {'module__activation': <class 'torch.nn.modules.activation.ReLU6'>}
 0.674479 (0.017566) with: {'module__activation': <class 'torch.nn.modules.activation.GELU'>}
 0.558594 (0.149189) with: {'module__activation': <class 'torch.nn.modules.activation.Softplus'>}
 0.675781 (0.014616) with: {'module__activation': <class 'torch.nn.modules.activation.Softsign'>}
 0.619792 (0.018688) with: {'module__activation': <class 'torch.nn.modules.activation.Tanh'>}
 0.643229 (0.019225) with: {'module__activation': <class 'torch.nn.modules.activation.Sigmoid'>}
 0.636719 (0.022326) with: {'module__activation': <class 'torch.nn.modules.activation.Hardsigmoid'>}

로그인 후 복사

ReLU 활성화 함수는 약 70%의 정확도로 가장 좋은 결과를 얻었습니다.

如何调整Dropout参数

在本例中，我们将尝试在0.0到0.9之间的dropout百分比(1.0没有意义)和在0到5之间的MaxNorm权重约束值。

import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.init as init
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
  
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
  
 # PyTorch classifier
 class PimaClassifier(nn.Module):
def __init__(self, dropout_rate=0.5, weight_cnotallow=1.0):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.dropout = nn.Dropout(dropout_rate)
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
self.weight_constraint = weight_constraint
# manually init weights
init.kaiming_uniform_(self.layer.weight)
init.kaiming_uniform_(self.output.weight)
  
def forward(self, x):
# maxnorm weight before actual forward pass
with torch.no_grad():
norm = self.layer.weight.norm(2, dim=0, keepdim=True).clamp(min=self.weight_constraint / 2)
desired = torch.clamp(norm, max=self.weight_constraint)
self.layer.weight *= (desired / norm)
# actual forward pass
x = self.act(self.layer(x))
x = self.dropout(x)
x = self.prob(self.output(x))
return x
  
 # create model with skorch
 model = NeuralNetClassifier(
PimaClassifier,
criterinotallow=nn.BCELoss,
optimizer=optim.Adamax,
max_epochs=100,
batch_size=10,
verbose=False
 )
  
 # define the grid search parameters
 param_grid = {
'module__weight_constraint': [1.0, 2.0, 3.0, 4.0, 5.0],
'module__dropout_rate': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
  
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

로그인 후 복사

结果如下：

Best: 0.701823 using {'module__dropout_rate': 0.1, 'module__weight_constraint': 2.0}
 0.669271 (0.015073) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 1.0}
 0.692708 (0.035132) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 2.0}
 0.589844 (0.170180) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 3.0}
 0.561198 (0.151131) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 4.0}
 0.688802 (0.021710) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 5.0}
 0.697917 (0.009744) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 1.0}
 0.701823 (0.016367) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 2.0}
 0.694010 (0.010253) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 3.0}
 0.686198 (0.025976) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 4.0}
 0.679688 (0.026107) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 5.0}
 0.701823 (0.029635) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 1.0}
 0.682292 (0.014731) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 2.0}
 0.701823 (0.009744) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 3.0}
 0.701823 (0.026557) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 4.0}
 0.687500 (0.015947) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 5.0}
 0.686198 (0.006639) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 1.0}
 0.656250 (0.006379) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 2.0}
 0.565104 (0.155608) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 3.0}
 0.700521 (0.028940) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 4.0}
 0.669271 (0.012890) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 5.0}
 0.661458 (0.018688) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 1.0}
 0.669271 (0.017566) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 2.0}
 0.652344 (0.006379) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 3.0}
 0.680990 (0.037783) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 4.0}
 0.692708 (0.042112) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 5.0}
 0.666667 (0.006639) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 1.0}
 0.652344 (0.011500) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 2.0}
 0.662760 (0.007366) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 3.0}
 0.558594 (0.146610) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 4.0}
 0.552083 (0.141826) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 5.0}
 0.548177 (0.141826) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 1.0}
 0.653646 (0.013279) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 2.0}
 0.661458 (0.008027) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 3.0}
 0.553385 (0.142719) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 4.0}
 0.669271 (0.035132) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 5.0}
 0.662760 (0.015733) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 1.0}
 0.636719 (0.024910) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 2.0}
 0.550781 (0.146818) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 3.0}
 0.537760 (0.140094) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 4.0}
 0.542969 (0.138144) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 5.0}
 0.565104 (0.148654) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 1.0}
 0.657552 (0.008027) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 2.0}
 0.428385 (0.111418) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 3.0}
 0.549479 (0.142719) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 4.0}
 0.648438 (0.005524) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 5.0}
 0.540365 (0.136861) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 1.0}
 0.605469 (0.053083) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 2.0}
 0.553385 (0.139948) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 3.0}
 0.549479 (0.142719) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 4.0}
 0.595052 (0.075566) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 5.0}

로그인 후 복사

可以看到，10%的Dropout和2.0的权重约束获得了70%的最佳精度。

如何调整隐藏层神经元的数量

单层神经元的数量是一个需要调优的重要参数。一般来说，一层神经元的数量控制着网络的表示能力，至少在拓扑的这一点上是这样。

理论上来说：由于通用逼近定理，一个足够大的单层网络可以近似任何其他神经网络。

在本例中，将尝试从1到30的值，步骤为5。一个更大的网络需要更多的训练，至少批大小和epoch的数量应该与神经元的数量一起优化。

import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.init as init
 import torch.optim as optim
 from skorch import NeuralNetClassifier
 from sklearn.model_selection import GridSearchCV
  
 # load the dataset, split into input (X) and output (y) variables
 dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
 X = dataset[:,0:8]
 y = dataset[:,8]
 X = torch.tensor(X, dtype=torch.float32)
 y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
  
 class PimaClassifier(nn.Module):
def __init__(self, n_neurnotallow=12):
super().__init__()
self.layer = nn.Linear(8, n_neurons)
self.act = nn.ReLU()
self.dropout = nn.Dropout(0.1)
self.output = nn.Linear(n_neurons, 1)
self.prob = nn.Sigmoid()
self.weight_constraint = 2.0
# manually init weights
init.kaiming_uniform_(self.layer.weight)
init.kaiming_uniform_(self.output.weight)
  
def forward(self, x):
# maxnorm weight before actual forward pass
with torch.no_grad():
norm = self.layer.weight.norm(2, dim=0, keepdim=True).clamp(min=self.weight_constraint / 2)
desired = torch.clamp(norm, max=self.weight_constraint)
self.layer.weight *= (desired / norm)
# actual forward pass
x = self.act(self.layer(x))
x = self.dropout(x)
x = self.prob(self.output(x))
return x
  
 # create model with skorch
 model = NeuralNetClassifier(
PimaClassifier,
criterinotallow=nn.BCELoss,
optimizer=optim.Adamax,
max_epochs=100,
batch_size=10,
verbose=False
 )
  
 # define the grid search parameters
 param_grid = {
'module__n_neurons': [1, 5, 10, 15, 20, 25, 30]
 }
 grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
 grid_result = grid.fit(X, y)
  
 # summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 means = grid_result.cv_results_['mean_test_score']
 stds = grid_result.cv_results_['std_test_score']
 params = grid_result.cv_results_['params']
 for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

로그인 후 복사

结果如下：

Best: 0.708333 using {'module__n_neurons': 30}
 0.654948 (0.003683) with: {'module__n_neurons': 1}
 0.666667 (0.023073) with: {'module__n_neurons': 5}
 0.694010 (0.014382) with: {'module__n_neurons': 10}
 0.682292 (0.014382) with: {'module__n_neurons': 15}
 0.707031 (0.028705) with: {'module__n_neurons': 20}
 0.703125 (0.030758) with: {'module__n_neurons': 25}
 0.708333 (0.015733) with: {'module__n_neurons': 30}