快捷搜索:  汽车  科技

如何比对机器学习模型的优劣(如何快速优化机器学习的模型参数)

如何比对机器学习模型的优劣(如何快速优化机器学习的模型参数)了解一个数据集最直观的方法就是把数据用可视化的方法呈现出来,降维方法我用了 PCA 和 t-SNE,不过从下面图片中看来,t-SNE 能实现数据的最大区分。(其实我个人认为处理数据用 scikit-learn 带的 StandardScaler 就挺好) 1model_info = {} 2model_info['Hidden layers'] = [100] * 6 3model_info['Input size'] = og_one_hot.shape[1] - 1 4model_info['Activations'] = ['relu'] * 6 5model_info['Optimization'] = 'adadelta' 6model_info["Learning rate"] = .005 7model_info["Batch size"] = 32 8mode

如何比对机器学习模型的优劣(如何快速优化机器学习的模型参数)(1)

作者 | Thomas Ciha

译者 | 刘旭坤

编辑 | Jane

出品 | AI科技大本营

【导读】一般来说机器学习模型的优化没什么捷径可循。用什么架构,选择什么优化算法和参数既取决于我们对数据集的理解,也要不断地试错和修正。所以快速构建和测试模型的能力对于项目的推进就显得至关重要了。本文我们就来构建一条生产模型的流水线,帮助大家实现参数的快速优化。

对深度学习模型来说,有下面这几个可控的参数:

  • 隐藏层的个数
  • 各层节点的数量
  • 激活函数
  • 优化算法
  • 学习效率
  • 正则化的方法
  • 正则化的参数

我们先把这些参数都写到一个存储模型参数信息的字典 model_info 中:

1model_info = {} 2model_info['Hidden layers'] = [100] * 6 3model_info['Input size'] = og_one_hot.shape[1] - 1 4model_info['Activations'] = ['relu'] * 6 5model_info['Optimization'] = 'adadelta' 6model_info["Learning rate"] = .005 7model_info["Batch size"] = 32 8model_info["Preprocessing"] = 'Standard' 9model_info["Lambda"] = 0 10model_2['Regularization'] = 'l2' 11model_2['Reg param'] = 0.0005

这里我们想实现对数据集的二元分类,大家可以从下面的链接中下载CSV格式的数据文件。

https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset

了解一个数据集最直观的方法就是把数据用可视化的方法呈现出来,降维方法我用了 PCA 和 t-SNE,不过从下面图片中看来,t-SNE 能实现数据的最大区分。(其实我个人认为处理数据用 scikit-learn 带的 StandardScaler 就挺好)

如何比对机器学习模型的优劣(如何快速优化机器学习的模型参数)(2)

接下来我们就可以用 model_info 中的参数来构建一个深度学习模型。下面这个 build_nn 函数根据输入的 model_info 中的参数构建,并返回一个深度学习模型:

1def build_nn(model_info): 2 """ 3 This function builds and compiles a NN given a hash table of the model's parameters. 4 :param model_info: 5 :return: 6 """ 7 8 try: 9 if model_info["Regularization"] == "l2": # if we're using L2 regularization 10 lambda_ = model_info['Reg param'] # get lambda parameter 11 batch_norm keep_prob = False False # set other regularization tactics 12 13 elif model_info['Regularization'] == 'Batch norm': # batch normalization regularization 14 lambda_ = 0 15 batch_norm = model_info['Reg param'] # get param 16 keep_prob = False 17 if batch_norm not in ['before' 'after']: # ensure we have a valid reg param 18 raise ValueError 19 20 elif model_info['Regularization'] == 'Dropout': # Dropout regularization 21 lambda_ batch_norm = 0 False 22 keep_prob = model_info['Reg param'] 23 except: 24 lambda_ batch_norm keep_prob = 0 False False # if no regularization is being used 25 26 hidden acts = model_info['Hidden layers'] model_info['Activations'] 27 model = Sequential(name=model_info['Name']) 28 model.add(InputLayer((model_info['Input size'] ))) # create input layer 29 first_hidden = True 30 31 for lay act i in zip(hidden acts range(len(hidden))): # create all the hidden layers 32 if lambda_ > 0: # if we're doing L2 regularization 33 if not first_hidden: 34 model.add(Dense(lay activation=act W_regularizer=l2(lambda_) input_shape=(hidden[i - 1] ))) # add additional layers 35 else: 36 model.add(Dense(lay activation=act W_regularizer=l2(lambda_) input_shape=(model_info['Input size'] ))) 37 first_hidden = False 38 else: # if we're not regularizing 39 if not first_hidden: 40 model.add(Dense(lay input_shape=(hidden[i-1] ))) # add un-regularized layers 41 else: 42 model.add(Dense(lay input_shape=(model_info['Input size'] ))) # if its first layer connect it to the input layer 43 first_hidden = False 44 45 if batch_norm == 'before': 46 model.add(BatchNormalization(input_shape=(lay ))) # add batch normalization layer 47 48 model.add(Activation(act)) # activation layer is part of the hidden layer 49 50 if batch_norm == 'after': 51 model.add(BatchNormalization(input_shape=(lay ))) # add batch normalization layer 52 53 if keep_prob: 54 model.add(Dropout(keep_prob input_shape=(lay ))) # dropout layer 55 56 # --------- Adding Output Layer ------------- 57 model.add(Dense(1 input_shape=(hidden[-1] ))) # add output layer 58 if batch_norm == 'before': # if we're using batch norm regularization 59 model.add(BatchNormalization(input_shape=(hidden[-1] ))) 60 model.add(Activation('sigmoid')) # apply output layer activation 61 if batch_norm == 'after': 62 model.add(BatchNormalization(input_shape=(hidden[-1] ))) # adding batch norm layer 63 64 if model_info['Optimization'] == 'adagrad': # setting an optimization method 65 opt = optimizers.Adagrad(lr = model_info["Learning rate"]) 66 elif model_info['Optimization'] == 'rmsprop': 67 opt = optimizers.RMSprop(lr = model_info["Learning rate"]) 68 elif model_info['Optimization'] == 'adadelta': 69 opt = optimizers.Adadelta() 70 elif model_info['Optimization'] == 'adamax': 71 opt = optimizers.Adamax(lr = model_info["Learning rate"]) 72 else: 73 opt = optimizers.Nadam(lr = model_info["Learning rate"]) 74 model.compile(optimizer=opt loss='binary_crossentropy' metrics=['accuracy']) # compile model 75 76 return model

有了这个 build_nn 函数我们就可以传不同的 model_info 给它,从而快速创建模型。下面我用了五个不同的隐藏层数目来实验不同模型架构的分类效果。

1def create_five_nns(input_size hidden_size act = None): 2 """ 3 Creates 5 neural networks to be used as a baseline in determining the influence model depth & width has on performance. 4 :param input_size: input layer size 5 :param hidden_size: list of hidden layer sizes 6 :param act: activation function to use for each layer 7 :return: list of model_info hash tables 8 """ 9 act = ['relu'] if not act else [act] # default activation = 'relu' 10 nns = [] # list of model info hash tables 11 model_info = {} # hash tables storing model information 12 model_info['Hidden layers'] = [hidden_size] 13 model_info['Input size'] = input_size 14 model_info['Activations'] = act 15 model_info['Optimization'] = 'adadelta' 16 model_info["Learning rate"] = .005 17 model_info["Batch size"] = 32 18 model_info["Preprocessing"] = 'Standard' 19 model_info2 model_info3 model_info4 model_info5 = model_info.copy() model_info.copy() model_info.copy() model_info.copy() 20 21 model_info["Name"] = 'Shallow NN' # build shallow nn 22 nns.append(model_info) 23 24 model_info2['Hidden layers'] = [hidden_size] * 3 # build medium nn 25 model_info2['Activations'] = act * 3 26 model_info2["Name"] = 'Medium NN' 27 nns.append(model_info2) 28 29 model_info3['Hidden layers'] = [hidden_size] * 6 # build deep nn 30 model_info3['Activations'] = act * 6 31 model_info3["Name"] = 'Deep NN 1' 32 nns.append(model_info3) 33 34 model_info4['Hidden layers'] = [hidden_size] * 11 # build really deep nn 35 model_info4['Activations'] = act * 11 36 model_info4["Name"] = 'Deep NN 2' 37 nns.append(model_info4) 38 39 model_info5['Hidden layers'] = [hidden_size] * 20 # build realllllly deep nn 40 model_info5['Activations'] = act * 20 41 model_info5["Name"] = 'Deep NN 3' 42 nns.append(model_info5) 43 return nns

可能是因为我们的数据比较非线性,我发现隐藏层的数量和节点个数与测试的结果成正比,隐藏层越多效果越好。这里每组参数构建出的模型我都用了五折交叉验证。五折交叉验证简单说就是说把数据集分成五份,四份用来训练模型,一份用来测试模型。这样轮换测试五次,五份中每一份都会当一次测试数据。然后我们取这五次测试结果的均值作为这个模型的测试结果。这里我们测试了正确率和 AUC,测试结果如下图:

如何比对机器学习模型的优劣(如何快速优化机器学习的模型参数)(3)

如果嫌交叉验证费时间,但是数据够用的话,我们也可以像下面的代码这样直接把数据集分成训练和测试两个子数据集:

1def quick_nn_test(model_info data_dict save_path): 2 model = build_nn(model_info) # use model info to build and compile a nn 3 stop = EarlyStopping(patience=5 monitor='acc' verbose=1) # maintain a max accuracy for a sliding window of 5 epochs. If we cannot breach max accuracy after 15 epochs cut model off and move on. 4 tensorboard_path =save_path model_info['Name'] # create path for tensorboard callback 5 tensorboard = TensorBoard(log_dir=tensorboard_path histogram_freq=0 write_graph=True write_images=True) # create tensorboard callback 6 save_model = ModelCheckpoint(filepath= save_path model_info['Name'] '\\' model_info['Name'] '_saved_' '.h5') # save model after every epoch 7 8 9 model.fit(data_dict['Training data'] data_dict['Training labels'] epochs=150 # fit model 10 batch_size=model_info['Batch size'] callbacks=[save_model stop tensorboard]) # evaluate train accuracy 11 train_acc = model.evaluate(data_dict['Training data'] data_dict['Training labels'] 12 batch_size=model_info['Batch size'] verbose = 0) 13 test_acc = model.evaluate(data_dict['Test data'] data_dict['Test labels'] # evaluate test accuracy 14 batch_size=model_info['Batch size'] verbose = 0) 15 16 17 # Get Train AUC 18 y_pred = model.predict(data_dict['Training data']).ravel() # predict on training data 19 fpr tpr thresholds = roc_curve(data_dict['Training labels'] y_pred) # compute fpr and tpr 20 auc_train = auc(fpr tpr) # compute AUC metric 21 # Get Test AUC 22 y_pred = model.predict(data_dict['Test data']).ravel() # same as above with test data 23 fpr tpr thresholds = roc_curve(data_dict['Test labels'] y_pred) # compute AUC 24 auc_test = auc(fpr tpr) 25 26 27 return train_acc test_acc auc_train auc_test

有的书上可能会讲到用网格搜索来实现超参数的优化,但网格搜索其实就是穷举法,现实中是很少能用到的。我们更常会用到的是优化思路:由粗到精,逐步收窄最优参数的范围。

1"""This section of code allows us to create and test many neural networks and save the results of a quick 2test into a CSV file. Once that CSV file has been created we will continue to add results onto the existing 3file.""" 4 5rapid_testing_path = 'YOUR PATH HERE' 6data_path = 'YOUR DATA PATH' 7 8try: # try to load existing csv 9 rapid_mlp_results = pd.read_csv(rapid_testing_path 'Results.csv') 10 index = rapid_mlp_results.shape[1] 11except: # if no csv exists yet create a DF 12 rapid_mlp_results = pd.DataFrame(columns=['Model' 'Train Accuracy' 'Test Accuracy' 'Train AUC' 'Test AUC' 13 'Preprocessing' 'Batch size' 'Learn Rate' 'Optimization' 'Activations' 14 'Hidden layers' 'Regularization']) 15 index = 0 16 17og_one_hot = np.array(pd.read_csv(data_path)) # load one hot data 18 19model_info = {} # create model_info dicts for all the models we want to test 20model_info['Hidden layers'] = [100] * 6 # specifies the number of hidden units per layer 21model_info['Input size'] = og_one_hot.shape[1] - 1 # input data size 22model_info['Activations'] = ['relu'] * 6 # activation function for each layer 23model_info['Optimization'] = 'adadelta' # optimization method 24model_info["Learning rate"] = .005 # learning rate for optimization method 25model_info["Batch size"] = 32 26model_info["Preprocessing"] = 'Standard' # specifies the preprocessing method to be used 27 28model_0 = model_info.copy() # create model 0 29model_0['Name'] = 'Model0' 30 31model_1 = model_info.copy() # create model 1 32model_1['Hidden layers'] = [110] * 3 33model_1['Name'] = 'Model1' 34 35model_2 = model_info.copy() # try best model so far with several regularization parameter values 36model_2['Hidden layers'] = [110] * 6 37model_2['Name'] = 'Model2' 38model_2['Regularization'] = 'l2' 39model_2['Reg param'] = 0.0005 40 41model_3 = model_info.copy() 42model_3['Hidden layers'] = [110] * 6 43model_3['Name'] = 'Model3' 44model_3['Regularization'] = 'l2' 45model_3['Reg param'] = 0.05 46 47# .... create more models .... 48 49#-------------- REGULARIZATION OPTIONS ------------- 50# L2 Regularization: Regularization: 'l2' Reg param: lambda value 51# Dropout: Regularization: 'Dropout' Reg param: keep_prob 52# Batch normalization: Regularization: 'Batch norm' Reg param: 'before' or 'after' 53 54 55models = [model_0 model_1 model_2] # make a list of model_info hash tables 56 57column_list = ['Model' 'Train Accuracy' 'Test Accuracy' 'Train AUC' 'Test AUC' 'Preprocessing' 58 'Batch size' 'Learn Rate' 'Optimization' 'Activations' 'Hidden layers' 59 'Regularization' 'Reg Param'] 60 61for model in models: # for each model_info in list of models to test test model and record results 62 train_data labels = preprocess_data(og_one_hot model['Preprocessing'] True) # preprocess raw data 63 data_dict = split_data(0.9 0 np.concatenate((train_data labels.reshape(29999 1)) axis=1)) # split data 64 train_acc test_acc auc_train auc_test = quick_nn_test(model data_dict save_path=rapid_testing_path) # quickly assess model 65 66 try: 67 reg = model['Regularization'] # set regularization parameters if given 68 reg_param = model['Reg param'] 69 except: 70 reg = "None" # else set NULL params 71 reg_param = 'NA' 72 73 val_lis = [model['Name'] train_acc[1] test_acc[1] auc_train auc_test model['Preprocessing'] 74 model["Batch size"] model["Learning rate"] model["Optimization"] str(model["Activations"]) 75 str(model["Hidden layers"]) reg reg_param] 76 77 df_dict = {} 78 for col val in zip(column_list val_lis): # create df dict to append to csv file 79 df_dict[col] = val 80 81 df = pd.DataFrame(df_dict index=[index]) 82 rapid_mlp_results = rapid_mlp_results.append(df ignore_index=False) 83 rapid_mlp_results.to_csv(rapid_testing_path "Results.csv" index=False)

我们先要有一个大致的优化方向和参数的大致范围。这样我们才能在范围内进行参数的随机抽样,然后根据结果进一步收窄参数的范围。下面的代码就在生成模型(其实是用于生成模型的 model_info 字典)的过程中加入了一些随机数:

1def generate_random_model(): 2 optimization_methods = ['adagrad' 'rmsprop' 'adadelta' 'adam' 'adamax' 'nadam'] # possible optimization methods 3 activation_functions = ['sigmoid' 'relu' 'tanh'] # possible activation functions 4 batch_sizes = [16 32 64 128 256 512] # possible batch sizes 5 range_hidden_units = range(5 250) # range of possible hidden units 6 model_info = {} # create hash table 7 same_units = np.random.choice([0 1] p=[1/5 4/5]) # dictates whether all hidden layers will have the same number of units 8 same_act_fun = np.random.choice([0 1] p=[1/10 9/10]) # will each hidden layer have the same activation function? 9 really_deep = np.random.rand() 10 range_layers = range(1 10) if really_deep < 0.8 else range(6 20) # 80% of time constrain number of hidden layers between 1 - 10 20% of time permit really deep architectures 11 num_layers = np.random.choice(range_layers p=[.1 .2 .2 .2 .05 .05 .05 .1 .05]) if really_deep < 0.8 else np.random.choice(range_layers) # choose number of layers 12 model_info["Activations"] = [np.random.choice(activation_functions p = [0.25 0.5 0.25])] * num_layers if same_act_fun else [np.random.choice(activation_functions p = [0.25 0.5 0.25]) for _ in range(num_layers)] # choose activation functions 13 model_info["Hidden layers"] = [np.random.choice(range_hidden_units)] * num_layers if same_units else [np.random.choice(range_hidden_units) for _ in range(num_layers)] # create hidden layers 14 model_info["Optimization"] = np.random.choice(optimization_methods) # choose an optimization method at random 15 model_info["Batch size"] = np.random.choice(batch_sizes) # choose batch size 16 model_info["Learning rate"] = 10 ** (-4 * np.random.rand()) # choose a learning rate on a logarithmic scale 17 model_info["Training threshold"] = 0.5 # set threshold for training 18 return model_info

到这里将我们快速优化的思路总结成八个大字就是:自动建模,逐步收窄。自动建模是通过 build_nn 这个函数实现的,逐步收窄则是通过参数区间的判断和随机抽样实现的。只要掌握好这个思路,相信大家都能实现对机器学习尤其是深度学习模型参数的快速优化。

原文链接:

https://towardsdatascience.com/how-to-rapidly-test-dozens-of-deep-learning-models-in-python-cb839b518531

猜您喜欢: