顔と性別の認識に VGG を使用する-Python チュートリアル-php.cn

Using VGGfor face and gender recognition

ディープラーニングと VGG16 を使用して顔と性別を認識する Python プロジェクトを構築する方法。

ディープラーニングとは何ですか?

ディープラーニングは機械学習のサブカテゴリであり、3 層以上のニューラルネットワークです。これらのニューラルネットワークは、大量のデータから学習することで人間の脳の動作をシミュレートしようとします。単一層のニューラルネットワークでもおおよその予測を行うことができますが、追加の隠れ層を最適化して精度を高めるのに役立ちます。

ディープラーニングは、人間の介入なしにタスクを実行することで自動化を向上させます。ディープラーニングは、デジタルアシスタント、音声対応テレビのリモコン、クレジットカード不正行為の検出、自動運転車などに使用されています。

Python プロジェクトのビルド

** GitHub で完全なコードを確認してください: https://github.com/alexiacismaru/face-recognision

顔検出に使用される VGG16 顔データセットと Haar Cascade XML ファイルをダウンロードします。これらは顔認識タスクの前処理に使用されます。

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontal_face_default.xml")) # haar cascade detects faces in images

vgg_face_dataset_url = "http://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz"

with request.urlopen(vgg_face_dataset_url) as r, open(os.path.join(base_path, "vgg_face_dataset.tar.gz"), 'wb') as f:
  f.write(r.read())

# extract VGG dataset
with tarfile.open(os.path.join(base_path, "vgg_face_dataset.tar.gz")) as f:
  f.extractall(os.path.join(base_path))

# download Haar Cascade for face detection
trained_haarcascade_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
with request.urlopen(trained_haarcascade_url) as r, open(os.path.join(base_path, "haarcascade_frontalface_default.xml"), 'wb') as f:
    f.write(r.read())

ログイン後にコピー

VGG 顔データセットから、事前定義された一連の被写体に対して特定の数の画像を選択的にロードして処理します。

# populate the list with the files of the celebrities that will be used for face recognition
all_subjects = [subject for subject in sorted(os.listdir(os.path.join(base_path, "vgg_face_dataset", "files"))) if subject.startswith("Jesse_Eisenberg") or subject.startswith("Sarah_Hyland") or subject.startswith("Michael_Cera") or subject.startswith("Mila_Kunis") and subject.endswith(".txt")]

# define number of subjects and how many pictures to extract
nb_subjects = 4
nb_images_per_subject = 40

ログイン後にコピー

主題に関連付けられたテキストファイルを開いて内容を読み取ることにより、各主題のファイルを繰り返し処理します。これらのファイルの各行には、画像への URL が含まれています。各 URL (画像を指す) に対して、コードは urllib を使用して画像をロードし、それを NumPy 配列に変換しようとします。

images = []

for subject in all_subjects[:nb_subjects]:
  with open(os.path.join(base_path, "vgg_face_dataset", "files", subject), 'r') as f:
    lines = f.readlines()

  images_ = []
  for line in lines:
    url = line[line.find("http://"): line.find(".jpg") + 4]

    try:
      res = request.urlopen(url)
      img = np.asarray(bytearray(res.read()), dtype="uint8")
      # convert the image data into a format suitable for OpenCV
      # images are colored 
      img = cv2.imdecode(img, cv2.IMREAD_COLOR)
      h, w = img.shape[:2]
      images_.append(img)
      cv2_imshow(cv2.resize(img, (w // 5, h // 5)))

    except:
      pass

    # check if the required number of images has been reached
    if len(images_) == nb_images_per_subject:
      # add the list of images to the main images list and move to the next subject
      images.append(images_)
      break

ログイン後にコピー

顔検出の設定

Using VGGfor face and gender recognition

画像内で 1 つ以上の顔を見つけてボックスに入れます。
顔がジオメトリやフォトメトリクスなどのデータベースと一致していることを確認してください。
認識タスクに使用できる特徴を顔から抽出します。
その顔を、準備されたデータベース内の 1 つ以上の既知の顔と照合します。

# create arrays for all 4 celebrities
jesse_images = []
michael_images = []
mila_images = []
sarah_images = []

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontalface_default.xml"))

# iterate over the subjects
for subject, images_ in zip(all_subjects, images):

  # create a grayscale copy to simplify the image and reduce computation
  for img in images_:
    img_ = img.copy()
    img_gray = cv2.cvtColor(img_, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(
        img_gray,
        scaleFactor=1.2,
        minNeighbors=5,
        minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE
    )
    print("Found {} face(s)!".format(len(faces)))

    for (x, y, w, h) in faces:
        cv2.rectangle(img_, (x, y), (x+w, y+h), (0, 255, 0), 10)

    h, w = img_.shape[:2]
    resized_img = cv2.resize(img_, (224, 224))
    cv2_imshow(resized_img)

    if "Jesse_Eisenberg" in subject:
        jesse_images.append(resized_img)
    elif "Michael_Cera" in subject:
        michael_images.append(resized_img)
    elif "Mila_Kunis" in subject:
        mila_images.append(resized_img)
    elif "Sarah_Hyland" in subject:
        sarah_images.append(resized_img)

ログイン後にコピー

detectMultiScale メソッドは、画像内の顔を認識します。次に、面が配置されていると思われる四角形の座標を返します。画像内で各顔の周囲に四角形が描画され、顔の位置が示されます。各画像は 224x224 ピクセルにサイズ変更されます。

データセットをトレーニングセットと検証セットに分割します:

トレーニングセットは、機械学習モデルをトレーニングするために使用されます。データ内のパターン、特徴、関係を学習するために使用されます。モデルはパラメーターを調整して、トレーニングデータに対して行われた予測や分類の誤差を最小限に抑えます。
検証セットは、新しいデータセットに対するモデルのパフォーマンスを評価します。これは、モデルが目に見えないデータに対してどの程度一般化されているかを確認するのに役立ちます。検証セットは、モデルのトレーニング中に使用されない独立したセットである必要があります。トレーニング中に検証セットからの情報を混合/使用すると、歪んだ結果が生じる可能性があります。

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontal_face_default.xml")) # haar cascade detects faces in images

vgg_face_dataset_url = "http://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz"

with request.urlopen(vgg_face_dataset_url) as r, open(os.path.join(base_path, "vgg_face_dataset.tar.gz"), 'wb') as f:
  f.write(r.read())

# extract VGG dataset
with tarfile.open(os.path.join(base_path, "vgg_face_dataset.tar.gz")) as f:
  f.extractall(os.path.join(base_path))

# download Haar Cascade for face detection
trained_haarcascade_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
with request.urlopen(trained_haarcascade_url) as r, open(os.path.join(base_path, "haarcascade_frontalface_default.xml"), 'wb') as f:
    f.write(r.read())

ログイン後にコピー

データ拡張

深層学習モデルの精度は、トレーニングデータの質、量、および文脈上の意味に依存します。これは深層学習モデルを構築する際の最も一般的な課題の 1 つであり、コストと時間がかかる可能性があります。企業はデータ拡張を使用してトレーニングサンプルへの依存を減らし、高精度のモデルを迅速に構築します。

データ拡張とは、既存のデータから新しいデータポイントを生成することによって、データの量を人為的に増やすことを意味します。これには、データに軽微な変更を加えたり、機械学習モデルを使用して元のデータの潜在空間に新しいデータポイントを生成してデータセットを増幅したりすることが含まれます。

合成は、現実世界の画像を使用せずに人工的に生成されたデータを表し、敵対的生成ネットワークによって生成されます。

拡張は、トレーニングセットの多様性を高めるために、ある種の小さな幾何学的変換 (反転、平行移動、回転、ノイズの追加など) を加えて元の画像から派生します。

# populate the list with the files of the celebrities that will be used for face recognition
all_subjects = [subject for subject in sorted(os.listdir(os.path.join(base_path, "vgg_face_dataset", "files"))) if subject.startswith("Jesse_Eisenberg") or subject.startswith("Sarah_Hyland") or subject.startswith("Michael_Cera") or subject.startswith("Mila_Kunis") and subject.endswith(".txt")]

# define number of subjects and how many pictures to extract
nb_subjects = 4
nb_images_per_subject = 40

ログイン後にコピー

データ拡張により、より多様なデータセットを通じて ML モデルのパフォーマンスが向上し、データ収集に関連する運用コストが削減されます。

左右反転: 画像は 0.7 の確率でランダムに水平方向に反転されます。これは、画像内の被写体の向きの違いによる変動をシミュレートします。
回転: 画像は 0.7 の確率でわずかに回転します (両方向に最大 10 度)。これにより、さまざまな頭のポーズをシミュレートすることで、データセットに変動性が追加されます。
グレースケール変換: 0.1 の確率で、画像はグレースケールに変換されます。これにより、モデルは色情報に関係なく画像を処理して学習できるようになります。
サンプリング:sample(50) メソッドは、元のセットから 50 個の拡張画像を生成します。これによりデータセットが拡張され、モデルが学習するためのより多くのデータが提供されます。

VGG16 モデルの実装

VGG16 は、画像認識に広く使用されている畳み込みニューラルネットワークです。これは、最高のコンピュータービジョンモデルアーキテクチャの 1 つであると考えられています。これは、精度を向上させるために画像を段階的に処理する 16 層の人工ニューロンで構成されています。 VGG16 では、「VGG」はオックスフォード大学の Visual Geometry Group を指し、「16」はネットワークの 16 の加重層を指します。

VGG16 は、画像認識と新しい画像の分類に使用されます。 VGG16 ネットワークの事前トレーニング済みバージョンは、ImageNet ビジュアルデータベースからの 100 万を超える画像でトレーニングされています。 VGG16 を適用すると、画像に特定のアイテム、動物、植物などが含まれているかどうかを判断できます。

VGG16 アーキテクチャ

Using VGGfor face and gender recognition

畳み込み層が 13 層、Max Pooling 層が 5 層、Dense 層が 3 層あります。これにより、16 個の重みを持つ 21 個のレイヤーが作成されます。これは、学習可能なパラメーターレイヤーが 16 個あることを意味します。 VGG16 は入力テンソルサイズを 224x244 として受け取ります。このモデルは、ストライド 1 の 3x3 フィルターの畳み込み層を持つことに重点を置いています。ストライド 2 の 2x2 フィルターの maxpool 層と常に同じパディングを使用します。

Conv-1 レイヤーには 64 個のフィルターがあり、Conv-2 には 128 個のフィルターがあり、Conv-3 には 256 個のフィルターがあり、Conv 4 と Conv 5 には 512 個のフィルターがあり、最初の 2 つはそれぞれ 4096 チャンネル、3 番目は完全に接続された 3 つのレイヤーがあります。 1000 方向の ILSVRC 分類を実行し、1000 チャネル (クラスごとに 1 つ) が含まれます。最後の層はソフトマックス層です。

ベースモデルの準備を開始します。

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontal_face_default.xml")) # haar cascade detects faces in images

vgg_face_dataset_url = "http://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz"

with request.urlopen(vgg_face_dataset_url) as r, open(os.path.join(base_path, "vgg_face_dataset.tar.gz"), 'wb') as f:
  f.write(r.read())

# extract VGG dataset
with tarfile.open(os.path.join(base_path, "vgg_face_dataset.tar.gz")) as f:
  f.extractall(os.path.join(base_path))

# download Haar Cascade for face detection
trained_haarcascade_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
with request.urlopen(trained_haarcascade_url) as r, open(os.path.join(base_path, "haarcascade_frontalface_default.xml"), 'wb') as f:
    f.write(r.read())

ログイン後にコピー

モデルが画像を正しく分類できるようにするには、追加のレイヤーでモデルを拡張する必要があります。

# populate the list with the files of the celebrities that will be used for face recognition
all_subjects = [subject for subject in sorted(os.listdir(os.path.join(base_path, "vgg_face_dataset", "files"))) if subject.startswith("Jesse_Eisenberg") or subject.startswith("Sarah_Hyland") or subject.startswith("Michael_Cera") or subject.startswith("Mila_Kunis") and subject.endswith(".txt")]

# define number of subjects and how many pictures to extract
nb_subjects = 4
nb_images_per_subject = 40

ログイン後にコピー

グローバル平均プーリング 2D レイヤー は、VGG16 から取得した特徴マップをマップごとに 1 つの 1D ベクトルに凝縮します。これにより出力が簡素化され、パラメーターの総数が減り、過学習の防止に役立ちます。

高密度レイヤーは、追加された完全に接続された (高密度) レイヤーのシーケンスです。各レイヤーには、一般的な実践と実験に基づいて選択された、指定された数のユニット (1024、512、および 256) が含まれています。これらの層は、VGG16 によって抽出された特徴をさらに処理します。

最後の高密度層 (出力層) は、バイナリ分類 (2 つのクラスは「女性」と「男性」) に適したシグモイド活性化を使用します。

アダムの最適化

Adam 最適化アルゴリズムは、トレーニングデータに基づいてネットワークの重みを反復的に更新するための確率的勾配降下手順を拡張したものです。この方法は、大量のデータまたはパラメータを含む大規模な問題を扱う場合に効率的です。必要なメモリが少なく効率的です。

このアルゴリズムは、運動量と二乗平均平方根伝播 (RMSP) という 2 つの勾配降下法を組み合わせたものです。

モメンタム は、勾配の指数加重平均を使用して勾配降下法アルゴリズムを高速化するために使用されるアルゴリズムです。

Using VGGfor face and gender recognition

二乗平均平方根プロップは、「指数移動平均」を取得することで AdaGrad を改善しようとする適応学習アルゴリズムです。

Using VGGfor face and gender recognition

mt と vt は両方とも 0 として初期化されているため (上記の方法に基づいて)、β1 と β2 ≈ 1 の両方として「0 に偏る」傾向があることが観察されます。このオプティマイザーは計算によってこの問題を修正します。「バイアス補正された」mt および vt。これは、グローバル最小値に到達する際に重みを制御し、それに近い場合の大きな振動を防ぐためにも行われます。使用される式は次のとおりです:

Using VGGfor face and gender recognition

直観的には、反復ごとに勾配降下法に適応しているので、プロセス全体を通じて制御され、偏りのない状態が保たれるため、Adam という名前が付けられています。

ここで、通常の重みパラメータ mt と vt の代わりに、バイアス補正された重みパラメータ (m_hat)t と (v_hat)t を使用します。これらを一般式に代入すると、次のようになります。

Using VGGfor face and gender recognition

出典: Geeksforgeeks、https://www.geeksforgeeks.org/adam-optimizer/

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontal_face_default.xml")) # haar cascade detects faces in images

vgg_face_dataset_url = "http://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz"

with request.urlopen(vgg_face_dataset_url) as r, open(os.path.join(base_path, "vgg_face_dataset.tar.gz"), 'wb') as f:
  f.write(r.read())

# extract VGG dataset
with tarfile.open(os.path.join(base_path, "vgg_face_dataset.tar.gz")) as f:
  f.extractall(os.path.join(base_path))

# download Haar Cascade for face detection
trained_haarcascade_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
with request.urlopen(trained_haarcascade_url) as r, open(os.path.join(base_path, "haarcascade_frontalface_default.xml"), 'wb') as f:
    f.write(r.read())

ログイン後にコピー

深層学習コンテキストで画像データの前処理、拡張、モデルトレーニングを設定します。

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontal_face_default.xml")) # haar cascade detects faces in images

vgg_face_dataset_url = "http://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz"

with request.urlopen(vgg_face_dataset_url) as r, open(os.path.join(base_path, "vgg_face_dataset.tar.gz"), 'wb') as f:
  f.write(r.read())

# extract VGG dataset
with tarfile.open(os.path.join(base_path, "vgg_face_dataset.tar.gz")) as f:
  f.extractall(os.path.join(base_path))

# download Haar Cascade for face detection
trained_haarcascade_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
with request.urlopen(trained_haarcascade_url) as r, open(os.path.join(base_path, "haarcascade_frontalface_default.xml"), 'wb') as f:
    f.write(r.read())

ログイン後にコピー

epochs: エポック数は、トレーニングデータセット全体がニューラルネットワークを通じて前後にどの程度渡されるかを指定します。モデルはトレーニングデータを 10 回実行します。エポックとは、学習機械に学習させるデータセットの 1 つの完全なプレゼンテーションです。
batch_size: このパラメータは、ネットワークを通じて一度に伝播されるサンプルの数を定義します。ここでは、バッチサイズ 30 を使用しています。これは、モデルが一度に 30 枚の画像を取得し、処理し、重みを更新してから、30 枚の画像の次のバッチに進むことを意味します。

モデルのパフォーマンスは、検証セットで予測を行うことによって評価されます。これにより、モデルが目に見えないデータをどの程度適切に実行するかがわかります。これらの予測にしきい値が適用され、各画像が 2 つのクラス (「男性」または「女性」) のいずれかに分類されます。

# populate the list with the files of the celebrities that will be used for face recognition
all_subjects = [subject for subject in sorted(os.listdir(os.path.join(base_path, "vgg_face_dataset", "files"))) if subject.startswith("Jesse_Eisenberg") or subject.startswith("Sarah_Hyland") or subject.startswith("Michael_Cera") or subject.startswith("Mila_Kunis") and subject.endswith(".txt")]

# define number of subjects and how many pictures to extract
nb_subjects = 4
nb_images_per_subject = 40

ログイン後にコピー

混同行列を作成して精度を視覚化します。

images = []

for subject in all_subjects[:nb_subjects]:
  with open(os.path.join(base_path, "vgg_face_dataset", "files", subject), 'r') as f:
    lines = f.readlines()

  images_ = []
  for line in lines:
    url = line[line.find("http://"): line.find(".jpg") + 4]

    try:
      res = request.urlopen(url)
      img = np.asarray(bytearray(res.read()), dtype="uint8")
      # convert the image data into a format suitable for OpenCV
      # images are colored 
      img = cv2.imdecode(img, cv2.IMREAD_COLOR)
      h, w = img.shape[:2]
      images_.append(img)
      cv2_imshow(cv2.resize(img, (w // 5, h // 5)))

    except:
      pass

    # check if the required number of images has been reached
    if len(images_) == nb_images_per_subject:
      # add the list of images to the main images list and move to the next subject
      images.append(images_)
      break

ログイン後にコピー

バイナリ分類の場合、受信者動作特性 (ROC) 曲線と曲線下面積 (AUC) は、真陽性率と偽陽性率の間のトレードオフを理解するのに役立ちます。

# create arrays for all 4 celebrities
jesse_images = []
michael_images = []
mila_images = []
sarah_images = []

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontalface_default.xml"))

# iterate over the subjects
for subject, images_ in zip(all_subjects, images):

  # create a grayscale copy to simplify the image and reduce computation
  for img in images_:
    img_ = img.copy()
    img_gray = cv2.cvtColor(img_, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(
        img_gray,
        scaleFactor=1.2,
        minNeighbors=5,
        minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE
    )
    print("Found {} face(s)!".format(len(faces)))

    for (x, y, w, h) in faces:
        cv2.rectangle(img_, (x, y), (x+w, y+h), (0, 255, 0), 10)

    h, w = img_.shape[:2]
    resized_img = cv2.resize(img_, (224, 224))
    cv2_imshow(resized_img)

    if "Jesse_Eisenberg" in subject:
        jesse_images.append(resized_img)
    elif "Michael_Cera" in subject:
        michael_images.append(resized_img)
    elif "Mila_Kunis" in subject:
        mila_images.append(resized_img)
    elif "Sarah_Hyland" in subject:
        sarah_images.append(resized_img)

ログイン後にコピー