python机器学习基本教程电子版(机器学习中Python库的3个简单实践)
python机器学习基本教程电子版(机器学习中Python库的3个简单实践)论文地址:使用图卷积网络(graph convolution network)处理输入场景图,图卷积网络沿着边缘传递信息,计算所有对象的嵌入向量。这些向量被用于预测所有对象的边界框和分割掩码,他们结合起来形成一个粗略的场景布局。布局被传递到级联细化网络,该网络在增加的空间尺度上生成输出图像。这个模型针对一对鉴别器网络(discriminator networks)进行对抗训练,以确保输出图像看起来较为真实。01sg2im:从场景图生成图像这个优秀的开源代码使用图卷积(graph convolution)来处理输入的图形,通过预测对象的边界框和分割掩码来计算场景布局,并将布局转换为具有级联细化网络(cascaded refinement network)的图像。
译者 | 婉清
编辑 | 姗姗
出品 | 人工智能头条
【导读】今天为大家介绍机器学习、深度学习中一些优秀、有意思的 Python 库,以及这些库的 Code 实践教程。涉及到的理论与学术内容会附上相应的论文与博客,方便大家参考学习。
01
sg2im:从场景图生成图像
这个优秀的开源代码使用图卷积(graph convolution)来处理输入的图形,通过预测对象的边界框和分割掩码来计算场景布局,并将布局转换为具有级联细化网络(cascaded refinement network)的图像。
使用图卷积网络(graph convolution network)处理输入场景图,图卷积网络沿着边缘传递信息,计算所有对象的嵌入向量。这些向量被用于预测所有对象的边界框和分割掩码,他们结合起来形成一个粗略的场景布局。布局被传递到级联细化网络,该网络在增加的空间尺度上生成输出图像。这个模型针对一对鉴别器网络(discriminator networks)进行对抗训练,以确保输出图像看起来较为真实。
论文地址:
https://arxiv.org/abs/1804.01622
GitHub 地址:
https://github.com/google/sg2im
关于级联细化论文可参阅:
Photographic Image Synthesis with Cascaded Refinement Networks
https://arxiv.org/abs/1707.09405
▌如何运行和测试代码?
首先复制下面这段代码:
gitclonehttps://github.com/google/sg2im.git
原始代码是在 Ubuntu 16.04 上使用 Python 3.5 和 PyTorch 0.4 进行开发和测试的。不过在虚拟环境中建议尝试一下通过设置虚拟环境来运行,可以参考下面的代码:
python3-mvenvenv#Createavirtualenvironment
sourceenv/bin/activate#Activatevirtualenvironment
pipinstall-rrequirements.txt#Installdependencies
echo$PWD>env/lib/python3.5/site-packages/sg2im.pth#Addcurrentdirectorytopythonpath
#Workforawhile...
deactivate#Exitvirtualenvironment
注意:需要安装python-venv。下面的代码大家可以参考一下。
python3-mvenv--without-pipenv#Addedthe--without-pip
sourceenv/bin/activate#Activatevirtualenvironment
pipinstall-rrequirements.txt#Installdependencies
echo$PWD>env/lib/python3.6/site-packages/sg2im.pth#Addcurrentdirectorytopythonpath
#Workforawhile...
deactivate#Exitvirtualenvironment
还需要从 requirements.txt 这个文件中中删除pkg-resources=0.0.0,否则会出现 bug。至于为什么要删除pkg-resources==0.0.0可以参考链接中的内容介绍。
参考链接:
https://stackoverflow.com/questions/39577984/what-is-pkg-resources-0-0-0-in-output-of-pip-freeze-command/39638060。
接下来要运行预训练的模型。
先运行脚本bash scripts/download_models.sh,下载模型后再开始,这个过程大约需要 355 MB 的硬盘空间。
-
sg2im-models/coco64.pt:在COCO-Stuff数据集上训练模型并生成64x64的图像。
-
sg2im-models/vg64.pt:在 Visual Genome 数据集上训练模型生成 64x64 图像。
-
sg2im-models/vg128.pt:在 Visual Genome 数据集上训练模型生成 128x128 图像。
参考论文:
Image Generation from Scene Graphs
https://arxiv.org/pdf/1804.01622.pdf
可以使用简单可读的 JSON 格式,运行脚本scripts/run_model.py,在新场景图上可以轻松运行任何预训练模型。如果要重新创建上面的绵羊图像,需要运行下面这行代码:
pythonscripts/run_model.py\
--checkpointsg2im-models/vg128.pt\
--scene_graphsscene_graphs/figure_6_sheep.json\
--output_diroutputs
下面是得到的图像结果
接下来我们一起看一下这段代码:
[
{
"objects":["sky" "grass" "zebra"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
]
}
{
"objects":["sky" "grass" "sheep"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
]
}
{
"objects":["sky" "grass" "sheep" "sheep"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
[3 "by" 2]
]
}
{
"objects":["sky" "grass" "sheep" "sheep" "tree"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
[3 "by" 2]
[4 "behind" 2]
]
}
{
"objects":["sky" "grass" "sheep" "sheep" "tree" "ocean"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
[3 "by" 2]
[4 "behind" 2]
[5 "by" 4]
]
}
{
"objects":["sky" "grass" "sheep" "sheep" "tree" "ocean" "boat"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
[3 "by" 2]
[4 "behind" 2]
[5 "by" 4]
[6 "in" 5]
]
}
{
"objects":["sky" "grass" "sheep" "sheep" "tree" "ocean" "boat"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
[3 "by" 2]
[4 "behind" 2]
[5 "by" 4]
[6 "on" 1]
]
}
]
首先分析第一段:
{
"objects":["sky" "grass" "zebra"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
]
}
对象:sky [0]、grass [1]、zebra [2]
关系:sky [0] 在 grass [1] 的上面 ("above")
zebra [2] 站在 grass [1] 上 ("standing on")
也可以创建一段类似的新代码来测试一下刚刚的效果:
[{
"objects":["sky" "grass" "dog" "cat" "tree" "ocean" "boat"]
"relationships":[
[0 "above" 1]
[2 "standingon" 1]
[3 "by" 2]
[4 "behind" 2]
[5 "by" 4]
[6 "on" 1]
]
}]
运行:
pythonscripts/run_model.py\
--checkpointsg2im-models/vg128.pt\
--scene_graphsscene_graphs/figure_blog.json\
--output_diroutputs
得到的图片是:
虽然看着有点奇怪,但是这个过程还是很有意思的。
02
TheAlgorithms/Python:
在Python中实现的所有算法
编程是数据科学中的必备技能,在这个伟大的知识资源库中,为大家介绍几个重要的算法实现。但是这些仅用于演示,由于性能的原因,在Python标准库中有许多更好的实现。
在Python标准库中你可以找到机器学习代码、神经网络、动态变成、排序、哈希等等。下面的代码教程是关于如何在 Python 中用 Numpy 从零开始构建 K-means。
'''README Author-AnuragKumar(mailto:anuragkumarak95@gmail.com)
Requirements:
-sklearn
-numpy
-matplotlib
Python:
-3.5
Inputs:
-X a2Dnumpyarrayoffeatures.
-k numberofclusterstocreate.
-initial_centroids initialcentroidvaluesgeneratedbyutilityfunction(mentionedinusage).
-maxiter maximumnumberofiterationstoprocess.
-heterogeneity emptylistthatwillbefilledwithhetrogeneityvaluesifpassedtokmeansfunc.
Usage:
1.define'k'value 'X'featuresarrayand'hetrogeneity'emptylist
2.createinitial_centroids
initial_centroids=get_initial_centroids(
X
k
seed=0#seedvalueforinitialcentroidgeneration Noneforrandomness(default=None)
)
3.findcentroidsandclustersusingkmeansfunction.
centroids cluster_assignment=kmeans(
X
k
initial_centroids
maxiter=400
record_heterogeneity=heterogeneity
verbose=True#whethertoprintlogsinconsoleornot.(default=False)
)
4.Plotthelossfunction hetrogeneityvaluesforeveryiterationsavedinhetrogeneitylist.
plot_heterogeneity(
heterogeneity
k
)
5.Havefun..
'''
from__future__importprint_function
fromsklearn.metricsimportpairwise_distances
importnumpyasnp
TAG='K-MEANS-CLUST/'
defget_initial_centroids(data k seed=None):
'''Randomlychoosekdatapointsasinitialcentroids'''
ifseedisnotNone:#usefulforobtainingconsistentresults
np.random.seed(seed)
n=data.shape[0]#numberofdatapoints
#PickKindicesfromrange[0 N).
rand_indices=np.random.randint(0 n k)
#Keepcentroidsasdenseformat asmanyentrieswillbenonzeroduetoaveraging.
#Aslongasatleastonedocumentinaclustercontainsaword
#itwillcarryanonzeroweightintheTF-IDFvectorofthecentroid.
centroids=data[rand_indices :]
returncentroids
defcentroid_pairwise_dist(X centroids):
returnpairwise_distances(X centroids metric='euclidean')
defassign_clusters(data centroids):
#Computedistancesbetweeneachdatapointandthesetofcentroids:
#Fillintheblank(RHSonly)
distances_from_centroids=centroid_pairwise_dist(data centroids)
#Computeclusterassignmentsforeachdatapoint:
#Fillintheblank(RHSonly)
cluster_assignment=np.argmin(distances_from_centroids axis=1)
returncluster_assignment
defrevise_centroids(data k cluster_assignment):
new_centroids=[]
foriinrange(k):
#Selectalldatapointsthatbelongtoclusteri.Fillintheblank(RHSonly)
member_data_points=data[cluster_assignment==i]
#Computethemeanofthedatapoints.Fillintheblank(RHSonly)
centroid=member_data_points.mean(axis=0)
new_centroids.append(centroid)
new_centroids=np.array(new_centroids)
returnnew_centroids
defcompute_heterogeneity(data k centroids cluster_assignment):
heterogeneity=0.0
foriinrange(k):
#Selectalldatapointsthatbelongtoclusteri.Fillintheblank(RHSonly)
member_data_points=data[cluster_assignment==i :]
ifmember_data_points.shape[0]>0:#checkifi-thclusterisnon-empty
#Computedistancesfromcentroidtodatapoints(RHSonly)
distances=pairwise_distances(member_data_points [centroids[i]] metric='euclidean')
squared_distances=distances**2
heterogeneity =np.sum(squared_distances)
returnheterogeneity
frommatplotlibimportpyplotasplt
defplot_heterogeneity(heterogeneity k):
plt.figure(figsize=(7 4))
plt.plot(heterogeneity linewidth=4)
plt.xlabel('#Iterations')
plt.ylabel('Heterogeneity')
plt.title('Heterogeneityofclusteringovertime K={0:d}'.format(k))
plt.rcParams.update({'font.size':16})
plt.show()
defkmeans(data k initial_centroids maxiter=500 record_heterogeneity=None verbose=False):
'''Thisfunctionrunsk-meansongivendataandinitialsetofcentroids.
maxiter:maximumnumberofiterationstorun.(default=500)
record_heterogeneity:(optional)alist tostorethehistoryofheterogeneityasfunctionofiterations
ifNone donotstorethehistory.
verbose:ifTrue printhowmanydatapointschangedtheirclusterlabelsineachiteration'''
centroids=initial_centroids[:]
prev_cluster_assignment=None
foritrinrange(maxiter):
ifverbose:
print(itr end='')
#1.Makeclusterassignmentsusingnearestcentroids
cluster_assignment=assign_clusters(data centroids)
#2.Computeanewcentroidforeachofthekclusters averagingalldatapointsassignedtothatcluster.
centroids=revise_centroids(data k cluster_assignment)
#Checkforconvergence:ifnoneoftheassignmentschanged stop
ifprev_cluster_assignmentisnotNoneand\
(prev_cluster_assignment==cluster_assignment).all():
break
#Printnumberofnewassignments
ifprev_cluster_assignmentisnotNone:
num_changed=np.sum(prev_cluster_assignment!=cluster_assignment)
ifverbose:
print('{0:5d}elementschangedtheirclusterassignment.'.format(num_changed))
#Recordheterogeneityconvergencemetric
ifrecord_heterogeneityisnotNone:
#YOURCODEHERE
score=compute_heterogeneity(data k centroids cluster_assignment)
record_heterogeneity.append(score)
prev_cluster_assignment=cluster_assignment[:]
returncentroids cluster_assignment
#Mocktestbelow
ifFalse:#changetotruetorunthistestcase.
importsklearn.datasetsasds
dataset=ds.load_iris()
k=3
heterogeneity=[]
initial_centroids=get_initial_centroids(dataset['data'] k seed=0)
centroids cluster_assignment=kmeans(dataset['data'] k initial_centroids maxiter=400
record_heterogeneity=heterogeneity verbose=True)
plot_heterogeneity(heterogeneity k)
GitHub 地址:https://github.com/TheAlgorithms