mirna预测分析(包搞定miRNA与靶基因预测问题)
mirna预测分析(包搞定miRNA与靶基因预测问题)multimir_switchDBVersion()函数切换版本。multimir_dbInfoVersions()函数查看历史版本,当前2.3.0版本为2020年4月15号更新;multiMiR包发布于2014年7月,当前版本于2020年4月更新,收录4 个外部数据库,其中8 个为预测的 miRNA-靶基因相互作用关系(DIANA-microT-CDS、ElMMo、MicroCosm、miRanda、miRDB、PicTar、PITA 和 TargetScan),其中有3 个为实验验证的 miRNA-靶基因相互作用关系(miRecords、miRTarBase 和TarBase),以及3个包含药物/疾病等相关信息的数据库(miR2Disease、Pharmaco-miR 和 PhenomiR),为基于miRNA-靶基因调控关系的疾病发病机制及诊疗相关研究提供极大便利。首先我们安装并加载m
multiMiR
一 包搞定miRNA与靶基因预测问题
嗨,大家好!做过miRNA相关研究的小伙伴一定都经历过这样的问题,即已知miRNA检索其靶基因都有哪些?或者是已知功能基因检索靶向其的miRNA都有哪些?目前已有的miRNA与靶基因调节关系的数据库有很多,如Tarbase、miRWalk、TargetScan、miRanda、RNAhybrid、TargetFinder、miRcode、miRDB、RNA22、starBase、miRTarBase、miRPathDB和ENCORI等。在应用时登录这些数据库逐个检索的话往往花费大量时间,尤其是某些数据库仅支持单个基因或miRNA检索时往往令人头疼。那么,今天带大家学习的是multiMiR包,内置14个miRNA相关数据库,仅一包搞定miRNA与靶基因预测问题,一起来看看吧~!
▌multiMiR基本信息
multiMiR包发布于2014年7月,当前版本于2020年4月更新,收录4 个外部数据库,其中8 个为预测的 miRNA-靶基因相互作用关系(DIANA-microT-CDS、ElMMo、MicroCosm、miRanda、miRDB、PicTar、PITA 和 TargetScan),其中有3 个为实验验证的 miRNA-靶基因相互作用关系(miRecords、miRTarBase 和TarBase),以及3个包含药物/疾病等相关信息的数据库(miR2Disease、Pharmaco-miR 和 PhenomiR),为基于miRNA-靶基因调控关系的疾病发病机制及诊疗相关研究提供极大便利。
首先我们安装并加载multiMiR包,资源来自于BiocManager网站,详细可参考:https://www.bioconductor.org/packages/release/bioc/vignettes/multiMiR/inst/doc/multiMiR.html。
##### 了解multiMiR包 #####
# 安装包
BiocManager::install("multiMiR")
# 加载包
library(multiMiR)
# Welcome to multiMiR.
#
# multiMiR database URL has been set to the
# default value: http://multimir.org/
#
# Database Version: 2.3.0 Updated: 2020-04-15
#
# Warning message:
# 程辑包‘multiMiR’是用R版本4.1.1 来建造的
关于multiMiR包版本:
multimir_dbInfoVersions()函数查看历史版本,当前2.3.0版本为2020年4月15号更新;
multimir_switchDBVersion()函数切换版本。
## 关于R包版本
db.ver <- multimir_dbInfoVersions() # 查看历史版本
db.ver
# VERSION UPDATED RDA DBNAME SCHEMA PUBLIC
# 1 2.3.0 2020-04-15 multimir_cutoffs_2.3.rda multimir2_3 multiMiR_DB_schema.sql 1
# 2 2.2.0 2017-08-08 multimir_cutoffs_2.2.rda multimir2_2 multiMiR_DB_schema.sql 1
# 3 2.1.0 2016-12-22 multimir_cutoffs_2.1.rda multimir2_1 multiMiR_DB_schema.sql 1
# 4 2.0.0 2015-05-01 multimir_cutoffs.rda multimir multiMiR_DB_schema.sql 1
multimir_switchDBVersion(db_version <- "2.0.0") # 切换版本
# Now using database version: 2.0.0
curr_vers <- db.ver[1 "VERSION"]
multimir_switchDBVersion(db_version <- curr_vers) # 切换版本
# Now using database version: 2.3.0
关于外部数据库:
multimirdbTables()函数查看14个外部数据库,multimirdbInfo()函数查看外部数据库版本信息;
predictedtables() 、validatedtables() 和diseasedrug_tables() 函数分别查看查看网站预测、实验验证的和收录疾病/药物相关性miRNA与靶基因关系数据库;
reversetablelookup("targetscan") 查看targetscan数据库类型为predicted;
multimir_dbCount() 函数查看数据库规模,版本multiMiR包含近 7000 万条记录,其中来自人类、小鼠和大鼠的 5830 个 miRNA 和 97186 个靶基因,以及 64 种药物和 223 个疾病术语。
db.tables <- multimir_dbTables() # 14个外部数据库
db.tables
# [1] "diana_microt" "elmmo" "map_counts" "map_metadata" "microcosm"
# [6] "mir2disease" "miranda" "mirdb" "mirecords" "mirna"
# [11] "mirtarbase" "pharmaco_mir" "phenomir" "pictar" "pita"
# [16] "tarbase" "target" "targetscan"
db.info <- multimir_dbInfo() # 外部数据库版本
db.info
# map_name source_name source_version source_date
# 1 diana_microt DIANA-microT 5 Sept 2013
# 2 elmmo EIMMo 5 Jan 2011
# 3 microcosm MicroCosm 5 Sept 2009
# 4 mir2disease miR2Disease Mar 14 2011
# 5 miranda miRanda Aug 2010
# 6 mirdb miRDB 6 June 2019
# 7 mirecords miRecords 4 Apr 27 2013
# 8 mirtarbase miRTarBase 7.0 Sept 2017
# 9 pharmaco_mir Pharmaco-miR (Verified Sets)
# 10 phenomir PhenomiR 2 Feb 15 2011
# 11 pictar PicTar 2 Dec 21 2012
# 12 pita PITA 6 Aug 31 2008
# 13 tarbase TarBase 8 2018
# 14 targetscan TargetScan 7.2 March 2018
predicted_tables() # 查看网站预测的miRNA-靶基因关系数据库
# [1] "diana_microt" "elmmo" "microcosm" "miranda" "mirdb" "pictar"
# [7] "pita" "targetscan"
validated_tables() # 查看实验验证的miRNA-靶基因关系数据库
# [1] "mirecords" "mirtarbase" "tarbase"
diseasedrug_tables() # 查看疾病/药物相关性miRNA-靶基因关系数据库
# [1] "mir2disease" "pharmaco_mir" "phenomir"
reverse_table_lookup("targetscan") # 查看数据库类型
# [1] "predicted"
db.count <- multimir_dbCount() # 查看数据库规模
db.count
# map_name human_count mouse_count rat_count total_count
# 1 diana_microt 7664602 3747171 0 11411773
# 2 elmmo 3959112 1449133 547191 5955436
# 3 microcosm 762987 534735 353378 1651100
# 4 mir2disease 2875 0 0 2875
# 5 miranda 5429955 2379881 247368 8057204
# 6 mirdb 1990425 1091263 199250 3280938
# 7 mirecords 2425 449 171 3045
# 8 mirtarbase 544588 50673 652 595913
# 9 pharmaco_mir 308 5 0 313
# 10 phenomir 15138 491 0 15629
# 11 pictar 404066 302236 0 706302
# 12 pita 7710936 5163153 0 12874089
# 13 tarbase 433048 209831 1307 644186
# 14 targetscan 13906497 10442093 0 24348590
apply(db.count[ -1] 2 sum) # 当前版本multiMiR包含近 7000 万条记录
# human_count mouse_count rat_count total_count
# 42826962 25371114 1349317 69547393
## 查看multiMiR收录的miRNA、gene、drug和disease
# 当前版本multiMiR收录来自人类、小鼠和大鼠的 5830 个 miRNA 和 97186 个靶基因,以及 64 种药物和 223 个疾病术语
miRNAs <- list_multimir("mirna" limit <- 10)
genes <- list_multimir("gene" limit <- 10)
drugs <- list_multimir("drug" limit <- 10)
diseases <- list_multimir("disease" limit <- 10)
head(miRNAs)
head(genes)
head(drugs)
head(diseases)
▌multiMiR应用示例
getmultimir()函数是multiMiR包的主体,首先运行 ??getmultimir 查看帮助文档。
关于几个参数介绍一下:
get_multimir(url = NULL org = "hsa" mirna = NULL target = NULL
disease.drug = NULL table = "validated" predicted.cutoff = NULL
predicted.cutoff.type = "p" predicted.site = "conserved"
summary = FALSE add.link = FALSE use.tibble = FALSE limit = NULL
legacy.out = FALSE)
org:表示物种来源,默认为人类。即org = "hsa" "human"或 "Homo Sapiens",其他两个物种是小鼠 ("mmu" "mouse"或"Mus musculus")和大鼠 ("rno" "rat"或"Rattus norvegicus")。
mirna:表示待检索的miRNA,默认为NULL。输入miRNA可以是miRNA accession number (i.e. "MIMAT0000072") mature miRNA ID (i.e. "hsa-miR-199a-3p") 或二者都有 (i.e. c("MIMAT0000065" "hsa-miR-30a-5p")),可以是单个miRNA或miRNA列表。
target:表示待检索靶基因,默认为NULL。输入基因可以是gene symbol (i.e. c("TP53" "KRAS")),Entrez gene ID (i.e. c(578 3845))、ensembl gene ID (i.e. "ENSG00000171791"),或三者都有 (i.e. c("TP53" 3845 "ENSG00000171791")),可以是单个基因或基因列表。
disease.drug:表示检索疾病/药物相关miRNA与靶基因关系,默认为NULL。输入参数可以是 disease(s) 和/或 drug(s) (i.e. c("bladder cancer" "cisplatin"))。
table:表示检索的数据库类型。默认validated,即实验验证的miRNA与靶基因关系数据库("mirecords" "mirtarbase" and "tarbase") 还可以是predicted,即网站预测的miRNA与靶基因关系数据库("dianamicrot" "elmmo" "microcosm" "miranda" "mirdb" "pictar" "pita" and "targetscan") 或者是disease.drug,即疾病/药物相关性miRNA与靶基因关系数据库 ( "mir2disease" "pharmacomir" and "phenomir") 若输入all表示检索以上全部数据库,或者输入某个数据库名称进行单个数据库检索。
predicted.cutoff和predicted.cutoff.type:表示数据库检索范围,默认为NULL,即若predicted.cutoff.type="p",检索范围为预测评分TOP20%,若predicted.cutoff.type="n",检索范围为预测评分TOP300000。
predicted.site:表示predicted数据库中检索保守("conserved")或非保守("nonconserved")位点,或不限制保守或非保守位点("all")。
summary:表示是否统计检索结果,默认FALSE。
1
示例一:检索给定miRNA所有经过验证的靶基因
以hsa-miR-182-5p为例,检索经实验验证的miRNA与靶基因调控关系,检索结果example1为S4格式文件,目录下data为检索结果列表。
## 检索给定miRNA所有经过验证的靶基因
# 以hsa-miR-182-5p为例
example1 <- get_multimir(org = "hsa" # 物种来源 hsa/mmu/rno
mirna = 'hsa-miR-182-5p'
table = "validated" # validated/predicted/all
summary = TRUE) # 检索结果汇总
# Searching mirecords ...
# Searching mirtarbase ...
# Searching tarbase ...
table(example1@data$type) # 查看检索结果 4057条信息
# validated
# 4057
example1_result <- example1@data # 提取data数据,检索结果列表
head(example1_result)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 1 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 2 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 3 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 4 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 5 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 6 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# target_ensembl experiment support_type pubmed_id type
# 1 ENSG00000174233 17597072 validated
# 2 ENSG00000174233 17597072 validated
# 3 ENSG00000174233 17597072 validated
# 4 ENSG00000174233 17597072 validated
# 5 ENSG00000174233 17597072 validated
# 6 ENSG00000174233 17597072 validated
检索结果列表提供hsa-miR-182-5p靶基因gene symbol、entrez ID和ensembl ID,以及来源数据库、实验方法和参考文献等信息。
接下来统计一下检索结果,mirecords数据库来源信息12条,mirtarbase数据库来源信息260条,tarbase数据库来源信息3785条,总计不重复结果3732条。
# 统计一下检索结果
example1_sum <- example1@summary # 提取summary数据
head(example1_sum)
# mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl mirecords
# 1 MIMAT0000259 hsa-miR-182-5p ADCY6 112 ENSG00000174233 6
# 2 MIMAT0000259 hsa-miR-182-5p MITF 4286 ENSG00000187098 4
# 3 MIMAT0000259 hsa-miR-182-5p AMMECR1L 83607 ENSG00000144233 0
# 4 MIMAT0000259 hsa-miR-182-5p ARRDC3 57561 ENSG00000113369 0
# 5 MIMAT0000259 hsa-miR-182-5p ATF1 466 ENSG00000123268 0
# 6 MIMAT0000259 hsa-miR-182-5p ATP13A3 79572 ENSG00000133657 0
# mirtarbase tarbase validated.sum all.sum
# 1 2 1 3 3
# 2 2 1 3 3
# 3 1 1 2 2
# 4 1 2 2 2
# 5 1 1 2 2
# 6 1 1 2 2
apply(example1@summary[ 6:10] 2 sum)
# mirecords mirtarbase tarbase validated.sum all.sum
# 12 260 3785 3732 3732
筛选出经过荧光素酶报告实验验证的结果,60个。
# 筛选经荧光素酶报告实验验证的结果
example1_Lucifer <- example1_result[grep("Luciferase"
example1_result[ "experiment"]) ]
head(example1_Lucifer)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 16 mirtarbase MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 17 mirtarbase MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 18 mirtarbase MIMAT0000259 hsa-miR-182-5p ATF1 466
# 21 mirtarbase MIMAT0000259 hsa-miR-182-5p BARD1 580
# 22 mirtarbase MIMAT0000259 hsa-miR-182-5p BCL2 596
# 25 mirtarbase MIMAT0000259 hsa-miR-182-5p BDNF 627
# target_ensembl experiment
# 16 ENSG00000174233 Luciferase reporter assay
# 17 ENSG00000174233 Luciferase reporter assay//qRT-PCR
# 18 ENSG00000123268 Luciferase reporter assay
# 21 ENSG00000138376 Luciferase reporter assay
# 22 ENSG00000171791 Flow//Luciferase reporter assay//qRT-PCR//Western blot
# 25 ENSG00000176697 Luciferase reporter assay//qRT-PCR
# support_type pubmed_id type
# 16 Functional MTI 17597072 validated
# 17 Functional MTI 20656788 validated
# 18 Functional MTI 23249749 validated
# 21 Functional MTI 23249749 validated
# 22 Functional MTI 22848417 validated
# 25 Functional MTI 25955435 validated
接下来,将上述3个数据库检索结果绘制Veen图,3个数据库检索结果取交集获得2个基因,即"ADCY6"和"MITF"。
# 提取3个数据检索得到的target_symbol,绘制Veen图
library(VennDiagram)
library(tidyverse)
db <- unique(example1_result$database)
row1 <- example1_result %>% filter(database == db[1])
row2 <- example1_result %>% filter(database == db[2])
row3 <- example1_result %>% filter(database == db[3])
venn.plot <-venn.diagram(
x = list(target1=unique(row1$target_symbol)
target2=unique(row2$target_symbol)
target3=unique(row3$target_symbol))
filename ="Venn1.tif"
lty ="dotted"
lwd =0.5
col ="black"
fill =c("dodgerblue" "goldenrod1" "darkorange1")
alpha =0.60
cat.col =c("dodgerblue" "goldenrod1" "darkorange1")
cat.cex =1
cat.fontface="bold"
margin =0.05
cex =1)
# 获取交集
venn_list <- list(target1=unique(row1$target_symbol)
target2=unique(row2$target_symbol)
target3=unique(row3$target_symbol))
for (i in 1:length(venn_list)) {
if(i == 1){interGenes <- venn_list[[1]]}
else{interGenes <- intersect(interGenes venn_list[[i]])}
}
interGenes
# [1] "ADCY6" "MITF"
2
示例二:检索与给定药物/疾病相关的 miRNA-靶基因
以检索与顺铂有关的人类miRNA-靶基因相互作用关系为例,定义isease.drug参数为顺铂"cisplatin",table 参数为'disease.drug',结果有45个。
# 检索与顺铂有关的人类miRNA-靶基因相互作用关系
example2 <- get_multimir(org = "hsa"
disease.drug = 'cisplatin' # 顺铂
table = 'disease.drug'
summary = TRUE)
# Searching mir2disease ...
# Searching pharmaco_mir ...
# Searching phenomir ...
table(example2@data$type) # 查看检索结果
# disease.drug
# 45
example2_result <- example2@data
head(example2_result)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 1 pharmaco_mir MIMAT0000772 hsa-miR-345-5p ABCC1 4363
# 2 pharmaco_mir MIMAT0000720 hsa-miR-376c-3p ALK7
# 3 pharmaco_mir MIMAT0000423 hsa-miR-125b-5p BAK1 578
# 4 pharmaco_mir hsa-miR-34 BCL2 596
# 5 pharmaco_mir MIMAT0000318 hsa-miR-200b-3p BCL2 596
# 6 pharmaco_mir MIMAT0000617 hsa-miR-200c-3p BCL2 596
# target_ensembl disease_drug paper_pubmedID type
# 1 ENSG00000103222 cisplatin 20099276 disease.drug
# 2 cisplatin 21224400 disease.drug
# 3 ENSG00000030110 cisplatin 21823019 disease.drug
# 4 ENSG00000171791 cisplatin 18803879 disease.drug
# 5 ENSG00000171791 cisplatin 21993663 disease.drug
# 6 ENSG00000171791 cisplatin 21993663 disease.drug
3
示例三:检索靶向调节给定基因的miRNA
以检索网站预测的小鼠中靶向调节Gnb1基因的miRNA为例,检索范围设置为预测评分TOP 35%,检索结果绘制upset展示。
# 检索网站预测的小鼠中靶向调节Gnb1基因的miRNA
example3 <- get_multimir(org = "mmu" # 物种来源 hsa/mmu/rno
target = "Gnb1" # 目标靶基因
table = "predicted" # 预测的miRNA-target
summary = TRUE
predicted.cutoff = 35 # 检索范围为预测评分TOP 35%
predicted.cutoff.type = "p" # 检索范围为预测评分TOP 35%
predicted.site = "all") # conserved/nonconserved/all
# Searching diana_microt ...
# Searching elmmo ...
# Searching microcosm ...
# Searching miranda ...
# Searching mirdb ...
# Searching pictar ...
# Searching pita ...
# Searching targetscan ...
table(example3@data$type) # 查看检索结果
# predicted
# 716
example3_result <- example3@data
head(example3_result)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 1 diana_microt MIMAT0000663 mmu-miR-218-5p Gnb1 14688
# 2 diana_microt MIMAT0017276 mmu-miR-493-5p Gnb1 14688
# 3 diana_microt MIMAT0000656 mmu-miR-139-5p Gnb1 14688
# 4 diana_microt MIMAT0014946 mmu-miR-3074-2-3p Gnb1 14688
# 5 diana_microt MIMAT0000144 mmu-miR-132-3p Gnb1 14688
# 6 diana_microt MIMAT0020608 mmu-miR-5101 Gnb1 14688
# target_ensembl score type
# 1 ENSMUSG00000029064 0.975 predicted
# 2 ENSMUSG00000029064 0.964 predicted
# 3 ENSMUSG00000029064 0.96 predicted
# 4 ENSMUSG00000029064 0.921 predicted
# 5 ENSMUSG00000029064 0.92 predicted
# 6 ENSMUSG00000029064 0.918 predicted
example3_sum <- example3@summary # 统计检索结果
head(example3_sum)
apply(example3@summary[ 6:13] 2 sum)
# diana_microt elmmo microcosm miranda mirdb pictar
# 105 108 5 49 59 18
# pita targetscan
# 175 197
获得8个数据库预测结果,接下来绘制upset图,首先准备upset绘图文件,如下图所示。
# 准备upset绘图文件
db2 <- unique(example3_result$database)
row1 <- example3_result %>% filter(database == db2[1])
row2 <- example3_result %>% filter(database == db2[2])
row3 <- example3_result %>% filter(database == db2[3])
row4 <- example3_result %>% filter(database == db2[4])
row5 <- example3_result %>% filter(database == db2[5])
row6 <- example3_result %>% filter(database == db2[6])
row7 <- example3_result %>% filter(database == db2[7])
row8 <- example3_result %>% filter(database == db2[8])
data3 <- cbind(unique(row1$mature_mirna_id)
unique(row2$mature_mirna_id)
unique(row3$mature_mirna_id)
unique(row4$mature_mirna_id)
unique(row5$mature_mirna_id)
unique(row6$mature_mirna_id)
unique(row7$mature_mirna_id)
unique(row8$mature_mirna_id)) %>%
as.data.frame()
colnames(data3) <- db2
# 构建函数
Upsetdata <- function(data){
require(tidyverse)
Genes <- data %>%
do.call(c .) %>%
unique()
upset0 = matrix(NA nrow = length(Genes) ncol = length(data))%>%
as.data.frame() %>%
`rownames<-`(Genes) %>%
rownames_to_column("Gene")
colnames(upset0)<-c("Gene" names(data))
for (i in 2:c(length(data) 1)) {
upset0[match(data[[i-1]] upset0[["Gene"]]) i] = 1
upset0[is.na(upset0[[i]]) i] <- 0
}
return(upset0)
}
upset_data <- Upsetdata(data3)
接下来绘制upset图,需调用UpSetR包。
# 绘制upset图
library(UpSetR)
upset(upset_data
nsets =13 # 绘制全部13个基因集 默认绘制前5个基因集
keep.order=TRUE # 保持输入文件列表中基因顺序
point.size =2
matrix.color=c("#BC80BD") # 点线图颜色
line.size =0.3
matrix.dot.alpha = 0.5 # 非交集点透明度
shade.color = "blue" #背景颜色
shade.alpha = 0.1) #背景透明度
4
示例四:批量检索目标miRNA靶基因或靶向目标基因的miRNA
以示例数据进行演示,示例数据DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因,加载方式为load(url("http://multimir.org/bladder.rda"))。
# 加载示例数据
# DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因
load(url("http://multimir.org/bladder.rda"))
View(DE.gene.info)
View(DE.miRNA.info)
以已知miRNA列表预测靶基因为例,结果14个数据库预测结果中,实验验证的有24806个,网站预测的有37331个,疾病/药物相关的有451个。
# 已知miRNA列表预测靶基因
example4 <- get_multimir(org = "hsa"
mirna = DE.miRNA.up
table = "all" # 检索14个数据库
summary = TRUE
predicted.cutoff.type = "p"
predicted.cutoff = 10
use.tibble = TRUE)
# Searching mirecords ...
# Searching mirtarbase ...
# Searching tarbase ...
# Searching diana_microt ...
# Searching elmmo ...
# Searching microcosm ...
# Searching miranda ...
# Searching mirdb ...
# Searching pictar ...
# Searching pita ...
# Searching targetscan ...
# Searching pharmaco_mir ...
# Joining by = c("database" "mature_mirna_acc" "mature_mirna_id" "target_symbol" "target_entrez" "target_ensembl" "type")
# Joining by = c("database" "mature_mirna_acc" "mature_mirna_id" "target_symbol" "target_entrez" "target_ensembl" "type")
save(example4 file = "example4.Rdata") # 时间较长,建议及时保存检索结果
load(file = "example4.Rdata")
table(example4@data$type)
# disease.drug predicted validated
# 451 37331 24806
example4_result <- example4@data
head(example4_result)
example4_sum <- example4@summary
head(example4_sum)
apply(example4@summary[ 6:23] 2 sum)
# diana_microt elmmo microcosm mir2disease miranda mirdb
# 9015 18378 696 114 4048 2722
# mirecords mirtarbase pharmaco_mir phenomir pictar pita
# 115 4175 9 328 25 1656
# tarbase targetscan disease.sum predicted.sum validated.sum all.sum
# 20516 791 22 21626 22970 45847
# 已知mRNA列表检索miRNA
example4_2 <- get_multimir(org = "hsa"
target = DE.entrez.dn
table = "all" # 检索14个数据库
summary = TRUE
predicted.cutoff.type = "p"
predicted.cutoff = 10
use.tibble = TRUE)
save(example4_2 file = "example4_2.Rdata") # 时间较长,建议及时保存检索结果
load(file = "example4_2.Rdata")
table(example4@data$type)
example4_result2 <- example4@data
head(example4_result2)
example4_sum2 <- example4@summary
head(example4_sum2)
apply(example4@summary[ 6:23] 2 sum)
2
示例五:检索一组 miRNA 和一组基因之间的相互作用
以示例数据进行演示,示例数据DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因,加载方式为load(url("http://multimir.org/bladder.rda"))。结果显示,网站预测、实验验证和疾病/药物相关的miRNA与靶基因调节关系有160个、98个和442个。
# 加载示例数据
load(url("http://multimir.org/bladder.rda"))
# 数据库检索
example5 <- get_multimir(org = "hsa"
mirna = DE.miRNA.up
target = DE.entrez.dn
table = "all"
summary = TRUE
predicted.cutoff.type = "p"
predicted.cutoff = 10
use.tibble = TRUE)
save(example5 file = "example5.Rdata")
load(file = "example5.Rdata")
table(example5@data$type)
# disease.drug predicted validated
# 442 160 98
example5_result <- example5@data
head(example5_result)
example5_sum <- example5@summary
head(example5_sum)
apply(example5@summary[ 6:19] 2 sum)
# diana_microt elmmo mir2disease miranda mirdb mirtarbase
# 27 92 114 13 11 8
# phenomir pita tarbase targetscan disease.sum predicted.sum
# 328 14 90 3 16 88
# validated.sum all.sum
# 91 198
接下来,筛选出实验验证的检索结果,有85个miRNA与靶基因调节关系,其中7个miRNA和41个mRNA,可按照不同数据库展示检索结果。
# 筛选实验验证的结果
result <- select(example5 keytype = "type"
keys = "validated"
columns = columns(example5))
# 去重
unique_pairs <- result[!duplicated(result[ c("mature_mirna_id"
"target_entrez")]) ] # 去重方法一
unique_pairs_2 <-
unique(data.frame(miRNA.ID = as.character(result$mature_mirna_id)
target.Entrez = as.character(result$target_entrez))) # 去重方法二
nrow(unique_pairs) # 85个miRNA-target
nrow(unique_pairs_2) # 85个miRNA-target
unique_miRNA <- unique(result$mature_mirna_id)
length(unique_miRNA) # 7个miRNA
unique_target <- unique(result$target_symbol)
length(unique_target) # 41个mRNA
# 按照不同数据库展示检索结果
example5_split <- split(result result$database) # 按照数据库分类
example5_split$mirtarbase
# # A tibble: 8 x 13
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# <chr> <chr> <chr> <chr> <chr>
# 1 mirtarbase MIMAT0000087 hsa-miR-30a-5p FDX1 2230
# 2 mirtarbase MIMAT0000259 hsa-miR-182-5p CUL5 8065
# 3 mirtarbase MIMAT0000418 hsa-miR-23b-3p RRAS2 22800
# 4 mirtarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998
# 5 mirtarbase MIMAT0000418 hsa-miR-23b-3p SWAP70 23075
# 6 mirtarbase MIMAT0000087 hsa-miR-30a-5p PEG10 23089
# 7 mirtarbase MIMAT0000245 hsa-miR-30d-5p PEG10 23089
# 8 mirtarbase MIMAT0000420 hsa-miR-30b-5p PEG10 23089
# # ... with 8 more variables: target_ensembl <chr> experiment <chr>
# # support_type <chr> pubmed_id <chr> type <chr> score <chr>
# # disease_drug <chr> paper_pubmedID <chr>
example5_split$tarbase
# # A tibble: 90 x 13
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# <chr> <chr> <chr> <chr> <chr>
# 1 tarbase MIMAT0000418 hsa-miR-23b-3p LIMA1 51474
# 2 tarbase MIMAT0000449 hsa-miR-146a-5p LIMA1 51474
# 3 tarbase MIMAT0000259 hsa-miR-182-5p FOXN3 1112
# 4 tarbase MIMAT0000449 hsa-miR-146a-5p FOXN3 1112
# 5 tarbase MIMAT0000418 hsa-miR-23b-3p BCAT1 586
# 6 tarbase MIMAT0000449 hsa-miR-146a-5p BCAT1 586
# 7 tarbase MIMAT0000080 hsa-miR-24-3p BCAT1 586
# 8 tarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998
# 9 tarbase MIMAT0000259 hsa-miR-182-5p LIMCH1 22998
# 10 tarbase MIMAT0000259 hsa-miR-182-5p IPO5 3843
# # ... with 80 more rows and 8 more variables: target_ensembl <chr>
# # experiment <chr> support_type <chr> pubmed_id <chr> type <chr>
# # score <chr> disease_drug <chr> paper_pubmedID <chr>
我们在筛选一个与膀胱癌相关的检索结果,即hsa-miR-23b-3p和hsa-miR-146a-5p。再检索一下实验验证的靶基因,以hsa-miR-23b-3p为例,结果以Veen图展示,靶基因有3个,即"NOTCH1","PLAU"和"MET"。
# 筛选膀胱癌相关结果
mykeys <- keys(example5 keytype = "disease_drug")
mykeys <- mykeys[grep("bladder" mykeys ignore.case = TRUE)]
result2 <- select(example5 keytype = "disease_drug" keys = mykeys
columns = columns(example5))
unique_pairs2 <- result2[!duplicated(result[ c("mature_mirna_id"
"target_entrez")]) ]
nrow(unique_pairs2) # 2个miRNA-target
# [1] 2
unique_pairs2
# # A tibble: 2 x 13
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# <chr> <chr> <chr> <chr> <chr>
# 1 mir2disease MIMAT0000418 hsa-miR-23b-3p NA NA
# 2 phenomir MIMAT0000449 hsa-miR-146a-5p NA NA
# # ... with 8 more variables: target_ensembl <chr> experiment <chr>
# # support_type <chr> pubmed_id <chr> type <chr> score <chr>
# # disease_drug <chr> paper_pubmedID <chr>
# 以hsa-miR-23b-3p为例检索实验验证的靶基因
example5_2 <- get_multimir(org = "hsa"
mirna = 'hsa-miR-23b-3p'
table = "validated"
summary = TRUE)
apply(example5_2@summary[ 6:10] 2 sum)
# mirecords mirtarbase tarbase validated.sum all.sum
# 7 433 3660 3750 3750
example1_Lucifer <- example5_2@data[grep("Luciferase"
example1_result[ "experiment"]) ]
nrow(example1_Lucifer) # Luciferase实验验证结果4条
# [1] 42
# 绘制Veen图
library(VennDiagram)
library(tidyverse)
db <- unique(example5_2@data$database)
row1 <- example5_2@data %>% filter(database == db[1])
row2 <- example5_2@data %>% filter(database == db[2])
row3 <- example5_2@data %>% filter(database == db[3])
venn.plot <-venn.diagram(
x = list(target1=unique(row1$target_symbol)
target2=unique(row2$target_symbol)
target3=unique(row3$target_symbol))
filename ="Venn1.tif"
lty ="dotted"
lwd =0.5
col ="black"
fill =c("dodgerblue" "goldenrod1" "darkorange1")
alpha =0.60
cat.col =c("dodgerblue" "goldenrod1" "darkorange1")
cat.cex =1
cat.fontface="bold"
margin =0.05
cex =1)
# 获取交集
venn_list <- list(target1=unique(row1$target_symbol)
target2=unique(row2$target_symbol)
target3=unique(row3$target_symbol))
for (i in 1:length(venn_list)) {
if(i == 1){interGenes <- venn_list[[1]]}
else{interGenes <- intersect(interGenes venn_list[[i]])}
}
interGenes
[1] "NOTCH1" "PLAU" "MET"
以上就是multiMiR包全部内容,开发并维护R包不易,小伙伴们使用时别忘记引用以下文献哦~!
Ru Y Kechris KJ Tabakoff B Hoffman P Radcliffe RA Bowler R Mahaffey S Rossi S Calin GA Bemis L Theodorescu D. The multiMiR R package and database: integration of microRNA-target interactions along with their disease and drug associations. Nucleic Acids Res. 2014;42(17):e133. doi: 10.1093/nar/gku631. Epub 2014 Jul 24. PMID: 25063298; PMCID: PMC4176155.