Python 機械学習に関するトップのインタビューの質問と回答-Python チュートリアル-php.cn

Top Python Machine Learning Interview Questions and Answers

機械学習 (ML) はテクノロジー業界で最も人気のある分野の 1 つであり、Python の豊富なライブラリと使いやすさを考慮すると、Python の熟練度が前提条件となることがよくあります。この分野の面接の準備をしている場合は、理論的な概念と実際の実装の両方に精通していることが重要です。ここでは、準備に役立つ Python ML 面接の一般的な質問と回答をいくつか紹介します。

1. Python で最もよく知っている前処理テクニックは何ですか?

前処理技術は、機械学習モデル用のデータを準備するために不可欠です。最も一般的な手法には次のようなものがあります。

正規化: 値の範囲の差を歪めることなく、特徴ベクトルの値を共通のスケールに調整します。
ダミー変数: pandas を使用して、カテゴリ変数が特定の値を取ることができるかどうかを示す指標変数 (0 または 1) を作成します。
外れ値のチェック: 単変量誤差、多変量誤差、ミンコフスキー誤差など、いくつかの方法を使用できます。

コード例:

from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Data normalization
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)

# Creating dummy variables
df_with_dummies = pd.get_dummies(data, drop_first=True)

ログイン後にコピー

2. ブルートフォースアルゴリズムとは何ですか?例を示します。

ブルートフォースアルゴリズム は、解決策を見つけるためにあらゆる可能性を徹底的に試みます。一般的な例は、アルゴリズムが配列の各要素をチェックして一致するものを見つける線形検索です。

コード例:

def linear_search(arr, target):
    for i in range(len(arr)):
        if arr[i] == target:
            return i
    return -1

# Example usage
arr = [2, 3, 4, 10, 40]
target = 10
result = linear_search(arr, target)

ログイン後にコピー

3. 不均衡なデータセットを処理するにはどのような方法がありますか?

不均衡なデータセットには、クラスの比率が歪んでいます。これに対処する戦略には以下が含まれます:

さらなるデータの収集: 少数派の層のためにさらに多くのデータを収集します。
リサンプリング: 少数クラスをオーバーサンプリングするか、多数クラスをアンダーサンプリングします。
SMOTE (合成マイノリティオーバーサンプリング技術): マイノリティクラスの合成サンプルを生成します。
アルゴリズム調整: バギングやブースティング手法など、不均衡を処理できるアルゴリズムを使用します。

コード例:

from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split

X_resampled, y_resampled = SMOTE().fit_resample(X, y)
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2)

ログイン後にコピー

4. Python で欠損データを処理する方法は何ですか?

欠損データを処理するための一般的な戦略には、省略と代入:

があります。

省略: 値が欠落している行または列を削除します。
代入: 平均値、中央値、モードなどの手法、または SimpleImputer や IterativeImputer などの高度なメソッドを使用して欠損値を埋めます。

コード例:

from sklearn.impute import SimpleImputer

# Imputing missing values
imputer = SimpleImputer(strategy='median')
data_imputed = imputer.fit_transform(data)

ログイン後にコピー

5. 回帰とは何ですか? Python で回帰を実装するにはどうすればよいでしょうか?

回帰は、変数間の相関関係を見つけて従属変数を予測するために使用される教師あり学習手法です。一般的な例には、Scikit-learn を使用して実装できる線形回帰とロジスティック回帰が含まれます。

コード例:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

ログイン後にコピー

6. Python でトレーニングデータセットとテストデータセットを分割するにはどうすればよいですか?

Python では、Scikit-learn の train_test_split 関数を使用して、データをトレーニングセットとテストセットに分割できます。

コード例:

from sklearn.model_selection import train_test_split

# Split the dataset: 60% training and 40% testing
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.4)

ログイン後にコピー

7. ツリーベースの学習者にとって最も重要なパラメータは何ですか?

ツリーベースの学習者にとって重要なパラメーターには次のものがあります。

max_ Depth: ツリーごとの最大深さ。
learning_rate: 各反復のステップサイズ。
n_estim- **n_estimators: アンサンブル内のツリーの数またはブースティングラウンドの数。
subsample: 各ツリーでサンプリングされる観測値の割合。

コード例:

from sklearn.ensemble import RandomForestClassifier

# Setting parameters for Random Forest
model = RandomForestClassifier(max_depth=5, n_estimators=100, max_features='sqrt', random_state=42)
model.fit(X_train, y_train)

ログイン後にコピー

8. Scikit-learn における一般的なハイパーパラメータ調整方法とは何ですか?

ハイパーパラメータ調整の一般的な方法は次の 2 つです:

グリッド検索: ハイパーパラメータ値のグリッドを定義し、最適な組み合わせを検索します。
ランダム検索: 幅広いハイパーパラメータ値を使用し、組み合わせをランダムに繰り返します。

コード例:

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

# Grid Search
param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [5, 10, 15]}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Random Search
param_dist = {'n_estimators': [50, 100, 200], 'max_depth': [5, 10, 15]}
random_search = RandomizedSearchCV(model, param_dist, n_iter=10, cv=5, random_state=42)
random_search.fit(X_train, y_train)

ログイン後にコピー

9. 雨が降った日の降水量の中央値を求める関数を作成します。

雨が降らなかった日を削除して、中央値を見つける必要があります。

コード例:

def median_rainfall(df_rain):
    # Remove days with no rain
    df_rain_filtered = df_rain[df_rain['rainfall'] > 0]
    # Find the median amount of rainfall
    median_rainfall = df_rain_filtered['rainfall'].median()
    return median_rainfall

ログイン後にコピー

10. 欠損値の代わりに選択したカリフォルニアチーズの中央価格を代入する関数を作成します。

パンダを使用して中央値を計算して埋めることができます。

Code Example:

def impute_median_price(df, column):
    median_price = df[column].median()
    df[column].fillna(median_price, inplace=True)
    return df

ログイン後にコピー

11. Write a Function to Return a New List Where All None Values Are Replaced with the Most Recent Non-None Value in the List.

Code Example:

def fill_none(input_list):
    prev_value = None
    result = []
    for value in input_list:
        if value is None:
            result.append(prev_value)
        else:
            result.append(value)
            prev_value = value
    return result

ログイン後にコピー

12. Write a Function Named grades_colors to Select Only the Rows Where the Student’s Favorite Color is Green or Red and Their Grade is Above 90.

Code Example:

def grades_colors(df_students):
    filtered_df = df_students[(df_students["grade"] > 90) & (df_students["favorite_color"].isin(["green", "red"]))]
    return filtered_df

ログイン後にコピー

13. Calculate the t-value for the Mean of ‘var’ Against a Null Hypothesis That μ = μ_0.

Code Example:

import pandas as pd
from scipy import stats

def calculate_t_value(df, column, mu_0):
    sample_mean = df[column].mean()
    sample_std = df[column].std()
    n = len(df)

    t_value = (sample_mean - mu_0) / (sample_std / (n ** 0.5))
    return t_value

# Example usage
t_value = calculate_t_value(df, 'var', mu_0)
print(t_value)

ログイン後にコピー

14. Build a K-Nearest Neighbors Classification Model from Scratch.

Code Example:

import numpy as np
import pandas as pd

def euclidean_distance(point1, point2):
    return np.sqrt(np.sum((point1 - point2) ** 2))

def kNN(k, data, new_point):
    distances = data.apply(lambda row: euclidean_distance(row[:-1], new_point), axis=1)
    sorted_indices = distances.sort_values().index
    top_k = data.iloc[sorted_indices[:k]]

    return top_k['label'].mode()[0]

# Example usage
data = pd.DataFrame({
    'feature1': [1, 2, 3, 4],
    'feature2': [2, 3, 4, 5],
    'label': [0, 0, 1, 1]
})

new_point = [2.5, 3.5]
k = 3

result = kNN(k, data, new_point)
print(result)

ログイン後にコピー

15. Build a Random Forest Model from Scratch.

Note: This example uses simplified assumptions to meet the interview constraints.

Code Example:

import pandas as pd
import numpy as np

def create_tree(dataframe, new_point):
    unique_classes = dataframe['class'].unique()
    for col in dataframe.columns[:-1]:  # Exclude the 'class' column
        if new_point[col] == 1:
            sub_data = dataframe[dataframe[col] == 1]
            if len(sub_data) > 0:
                return sub_data['class'].mode()[0]
    return unique_classes[0]  # Default to the most frequent class

def random_forest(df, new_point, n_trees):
    results = []
    for _ in range
n_trees):
        tree_result = create_tree(df, new_point)
        results.append(tree_result)
    # Majority vote
    return max(set(results), key=results.count)

# Example usage
df = pd.DataFrame({
    'feature1': [0, 1, 1, 0],
    'feature2': [0, 0, 1, 1],
    'class': [0, 1, 1, 0]
})

new_point = {'feature1': 1, 'feature2': 0}
n_trees = 5

result = random_forest(df, new_point, n_trees)
print(result)

ログイン後にコピー

16. Build a Logistic Regression Model from Scratch.

Code Example:

import pandas as pd
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def logistic_regression(X, y, num_iterations, learning_rate):
    weights = np.zeros(X.shape[1])
    for i in range(num_iterations):
        z = np.dot(X, weights)
        predictions = sigmoid(z)
        errors = y - predictions
        gradient = np.dot(X.T, errors)

gradient = np.dot(X.T, errors)
        weights += learning_rate * gradient
    return weights

# Example usage
df = pd.DataFrame({
    'feature1': [0, 1, 1, 0],
    'feature2': [0, 0, 1, 1],
    'class': [0, 1, 1, 0]
})

X = df[['feature1', 'feature2']].values
y = df['class'].values
num_iterations = 1000
learning_rate = 0.01

weights = logistic_regression(X, y, num_iterations, learning_rate)
print(weights)

ログイン後にコピー

17. Build a K-Means Algorithm from Scratch.

Code Example:

import numpy as np

def k_means(data_points, k, initial_centroids):
    centroids = initial_centroids
    while True:
        distances = np.linalg.norm(data_points[:, np.newaxis] - centroids, axis=2)
        clusters = np.argmin(distances, axis=1)
        new_centroids = np.array([data_points[clusters == i].mean(axis=0) for i in range(k)])        

        if np.all(centroids == new_centroids):
            break
        centroids = new_centroids
    return clusters

# Example usage
data_points = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
k = 2
initial_centroids = np.array([[1, 2], [10, 2]])

clusters = k_means(data_points, k, initial_centroids)
print(clusters)

ログイン後にコピー

18. What is Machine Learning and How Does it Work?

Machine Learning is a field of artificial intelligence focused on building algorithms that enable computers to learn from data without explicit programming. It uses algorithms to analyze and identify patterns in data and make predictions based on those patterns.

Example Answer:

"Machine learning is a branch of artificial intelligence that involves creating algorithms capable of learning from and making predictions based on data. It works by training a model on a dataset and then using that model to make predictions on new data."

19. What are the Different Types of Machine Learning Algorithms?

There are three main types of machine learning algorithms:

Supervised Learning: Useslabeled data and makes predictions based on this information. Examples include linear regression and classification algorithms.
Unsupervised Learning: Processes unlabeled data and seeks to find patterns or relationships in it. Examples include clustering algorithms like K-means.
Reinforcement Learning: The algorithm learns from interacting with its environment, receiving rewards or punishments for certain actions. Examples include training AI agents in games.

Example Answer:

"There are three main types of machine learning algorithms: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to make predictions, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns from interactions with the environment to maximize rewards."

20. What is Cross-Validation and Why is it Important in Machine Learning?

Cross-validation is a technique to evaluate the performance of a machine learning model by dividing the dataset into two parts: a training set and a validation set. The training set trains the model, whereas the validation set evaluates it.

Importance:

Prevents overfitting by ensuring the model generalizes well to unseen data.
Provides a more accurate measure of model performance.

Example Answer:

"Cross-validation is a technique used to evaluate a machine learning model'sperformance by dividing the dataset into training and validation sets. It helps ensure the model generalizes well to new data, preventing overfitting and providing a more accurate measure of performance."

21. What is an Artificial Neural Network and How Does it Work?

Artificial Neural Networks (ANNs) are models inspired by the human brain's structure. They consist of layers of interconnected nodes (neurons) that process input data and generate output predictions.

Example Answer:

"An artificial neural network is a machine learning model inspired by the structure and function of the human brain. It comprises layers of interconnected neurons that process input data through weighted connections to make predictions."

22. What is a Decision Tree and How to Use it in Machine Learning?

Decision Trees are models for classification and regression tasks that split data into subsets based on the values of input variables to generate prediction rules.

Example Answer:

"A decision tree is a tree-like model used for classification and regression tasks. It works by recursively splitting data into subsets based on input variables, creating rules for making predictions."

23. What is the K-Nearest Neighbors (KNN) Algorithm and How Does it Work?

K-Nearest Neighbors (KNN) is a simple machine learning algorithm usedfor classification or regression tasks. It determines the k closest data points in the feature space to a given unseen data point and classifies it based on the majority class of its k nearest neighbors.

Example Answer:

"The K-Nearest Neighbors (KNN) algorithm is a machine learning technique used for classification or regression. It works by identifying the k closest data points to a given point in the feature space and classifying it based on the majority class among the k nearest neighbors."

24. What is the Support Vector Machine Algorithm and How Does it Work?

Support Vector Machines (SVM) are linear models used for binary classification and regression tasks. They find the most suitable boundary (hyperplane) that separates data into classes. Data points closest to the hyperplane, called support vectors, play a critical role in defining this boundary.

Example Answer:

"The Support Vector Machine (SVM) algorithm is a linear model used for binary classification and regression tasks. It identifies the best hyperplane that separates data into classes, relying heavily on the data points closest to the hyperplane, known as support vectors."

25. What is Regularization, and How Do You Use it in Machine Learning?

Regularization is a technique to prevent overfitting in machinelearning models by adding a penalty term to the loss function. This penalty discourages the model from learning overly complex relationships in the data.

Example Answer:

"Regularization is a technique to prevent overfitting in machine learning models by adding a penalty term to the loss function, which discourages the model from learning overly complex patterns. Common types of regularization include L1 (Lasso) and L2 (Ridge) regularization."

Code Example:

from sklearn.linear_model import Ridge

# Applying L2 Regularization (Ridge Regression)
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

ログイン後にコピー

26. Can You Explain How Gradient Descent Works?

Gradient Descent is an optimization algorithm used to minimize a cost function in machine learning. It iteratively adjusts the parameters of the model in the direction of the negative gradient of the cost function until it reaches a minimum.

Example Answer:

"Gradient Descent is an optimization algorithm used to minimize a cost function in machine learning. It iteratively updates the model parameters in the direction of the negative gradient of the cost function, aiming to find the parameters that minimize the cost."

27. Can You Explain the Concept of Ensemble Learning

Ensemble Learning is a technique where multiple models (often called "weak learners") are combined to solve a prediction task. The combined model is generally more robust and performs better than individual models.

Example Answer:

"Ensemble learning is a machine learning technique where multiple models are combined to solve a prediction task. Common ensemble methods include bagging, boosting, and stacking. Combining the predictions of individual models can improve performance and reduce the risk of overfitting."

Example Code for Random Forest (an ensemble method):

from sklearn.ensemble import RandomForestClassifier

# Ensemble learning using Random Forest
model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

ログイン後にコピー

Conclusion

Preparing for a Python machine learning interview involves understanding both theoretical concepts and practical implementations. This guide has covered several essential questions and answers that frequently come up in interviews. By familiarizing yourself with these topics and practicing the provided code examples, you'll be well-equipped to handle a wide range of questions in your next machine learning interview. Good luck!

Visit MyExamCloud and see the most recent Python Certification Practice Tests. Begin creating your Study Plan today.

以上がPython 機械学習に関するトップのインタビューの質問と回答の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。