快捷搜索:  汽车  科技

mirna预测分析(包搞定miRNA与靶基因预测问题)

mirna预测分析(包搞定miRNA与靶基因预测问题)multimir_switchDBVersion()函数切换版本。multimir_dbInfoVersions()函数查看历史版本,当前2.3.0版本为2020年4月15号更新;multiMiR包发布于2014年7月,当前版本于2020年4月更新,收录4 个外部数据库,其中8 个为预测的 miRNA-靶基因相互作用关系(DIANA-microT-CDS、ElMMo、MicroCosm、miRanda、miRDB、PicTar、PITA 和 TargetScan),其中有3 个为实验验证的 miRNA-靶基因相互作用关系(miRecords、miRTarBase 和TarBase),以及3个包含药物/疾病等相关信息的数据库(miR2Disease、Pharmaco-miR 和 PhenomiR),为基于miRNA-靶基因调控关系的疾病发病机制及诊疗相关研究提供极大便利。首先我们安装并加载m

multiMiR

一 包搞定miRNA与靶基因预测问题

嗨,大家好!做过miRNA相关研究的小伙伴一定都经历过这样的问题,即已知miRNA检索其靶基因都有哪些?或者是已知功能基因检索靶向其的miRNA都有哪些?目前已有的miRNA与靶基因调节关系的数据库有很多,如Tarbase、miRWalk、TargetScan、miRanda、RNAhybrid、TargetFinder、miRcode、miRDB、RNA22、starBase、miRTarBase、miRPathDB和ENCORI等。在应用时登录这些数据库逐个检索的话往往花费大量时间,尤其是某些数据库仅支持单个基因或miRNA检索时往往令人头疼。那么,今天带大家学习的是multiMiR包,内置14个miRNA相关数据库,仅一包搞定miRNA与靶基因预测问题,一起来看看吧~!

▌multiMiR基本信息

multiMiR包发布于2014年7月,当前版本于2020年4月更新,收录4 个外部数据库,其中8 个为预测的 miRNA-靶基因相互作用关系(DIANA-microT-CDS、ElMMo、MicroCosm、miRanda、miRDB、PicTar、PITA 和 TargetScan),其中有3 个为实验验证的 miRNA-靶基因相互作用关系(miRecords、miRTarBase 和TarBase),以及3个包含药物/疾病等相关信息的数据库(miR2Disease、Pharmaco-miR 和 PhenomiR),为基于miRNA-靶基因调控关系的疾病发病机制及诊疗相关研究提供极大便利。

mirna预测分析(包搞定miRNA与靶基因预测问题)(1)

首先我们安装并加载multiMiR包,资源来自于BiocManager网站,详细可参考:https://www.bioconductor.org/packages/release/bioc/vignettes/multiMiR/inst/doc/multiMiR.html

##### 了解multiMiR包 ##### # 安装包 BiocManager::install("multiMiR") # 加载包 library(multiMiR) # Welcome to multiMiR. # # multiMiR database URL has been set to the # default value: http://multimir.org/ # # Database Version: 2.3.0 Updated: 2020-04-15 # # Warning message: # 程辑包‘multiMiR’是用R版本4.1.1 来建造的

关于multiMiR包版本:

multimir_dbInfoVersions()函数查看历史版本,当前2.3.0版本为2020年4月15号更新;

multimir_switchDBVersion()函数切换版本。

## 关于R包版本 db.ver <- multimir_dbInfoVersions() # 查看历史版本 db.ver # VERSION UPDATED RDA DBNAME SCHEMA PUBLIC # 1 2.3.0 2020-04-15 multimir_cutoffs_2.3.rda multimir2_3 multiMiR_DB_schema.sql 1 # 2 2.2.0 2017-08-08 multimir_cutoffs_2.2.rda multimir2_2 multiMiR_DB_schema.sql 1 # 3 2.1.0 2016-12-22 multimir_cutoffs_2.1.rda multimir2_1 multiMiR_DB_schema.sql 1 # 4 2.0.0 2015-05-01 multimir_cutoffs.rda multimir multiMiR_DB_schema.sql 1 multimir_switchDBVersion(db_version <- "2.0.0") # 切换版本 # Now using database version: 2.0.0 curr_vers <- db.ver[1 "VERSION"] multimir_switchDBVersion(db_version <- curr_vers) # 切换版本 # Now using database version: 2.3.0

关于外部数据库:

multimirdbTables()函数查看14个外部数据库,multimirdbInfo()函数查看外部数据库版本信息;

predictedtables() 、validatedtables() 和diseasedrug_tables() 函数分别查看查看网站预测、实验验证的和收录疾病/药物相关性miRNA与靶基因关系数据库;

reversetablelookup("targetscan") 查看targetscan数据库类型为predicted;

multimir_dbCount() 函数查看数据库规模,版本multiMiR包含近 7000 万条记录,其中来自人类、小鼠和大鼠的 5830 个 miRNA 和 97186 个靶基因,以及 64 种药物和 223 个疾病术语。

db.tables <- multimir_dbTables() # 14个外部数据库 db.tables # [1] "diana_microt" "elmmo" "map_counts" "map_metadata" "microcosm" # [6] "mir2disease" "miranda" "mirdb" "mirecords" "mirna" # [11] "mirtarbase" "pharmaco_mir" "phenomir" "pictar" "pita" # [16] "tarbase" "target" "targetscan" db.info <- multimir_dbInfo() # 外部数据库版本 db.info # map_name source_name source_version source_date # 1 diana_microt DIANA-microT 5 Sept 2013 # 2 elmmo EIMMo 5 Jan 2011 # 3 microcosm MicroCosm 5 Sept 2009 # 4 mir2disease miR2Disease Mar 14 2011 # 5 miranda miRanda Aug 2010 # 6 mirdb miRDB 6 June 2019 # 7 mirecords miRecords 4 Apr 27 2013 # 8 mirtarbase miRTarBase 7.0 Sept 2017 # 9 pharmaco_mir Pharmaco-miR (Verified Sets) # 10 phenomir PhenomiR 2 Feb 15 2011 # 11 pictar PicTar 2 Dec 21 2012 # 12 pita PITA 6 Aug 31 2008 # 13 tarbase TarBase 8 2018 # 14 targetscan TargetScan 7.2 March 2018 predicted_tables() # 查看网站预测的miRNA-靶基因关系数据库 # [1] "diana_microt" "elmmo" "microcosm" "miranda" "mirdb" "pictar" # [7] "pita" "targetscan" validated_tables() # 查看实验验证的miRNA-靶基因关系数据库 # [1] "mirecords" "mirtarbase" "tarbase" diseasedrug_tables() # 查看疾病/药物相关性miRNA-靶基因关系数据库 # [1] "mir2disease" "pharmaco_mir" "phenomir" reverse_table_lookup("targetscan") # 查看数据库类型 # [1] "predicted" db.count <- multimir_dbCount() # 查看数据库规模 db.count # map_name human_count mouse_count rat_count total_count # 1 diana_microt 7664602 3747171 0 11411773 # 2 elmmo 3959112 1449133 547191 5955436 # 3 microcosm 762987 534735 353378 1651100 # 4 mir2disease 2875 0 0 2875 # 5 miranda 5429955 2379881 247368 8057204 # 6 mirdb 1990425 1091263 199250 3280938 # 7 mirecords 2425 449 171 3045 # 8 mirtarbase 544588 50673 652 595913 # 9 pharmaco_mir 308 5 0 313 # 10 phenomir 15138 491 0 15629 # 11 pictar 404066 302236 0 706302 # 12 pita 7710936 5163153 0 12874089 # 13 tarbase 433048 209831 1307 644186 # 14 targetscan 13906497 10442093 0 24348590 apply(db.count[ -1] 2 sum) # 当前版本multiMiR包含近 7000 万条记录 # human_count mouse_count rat_count total_count # 42826962 25371114 1349317 69547393 ## 查看multiMiR收录的miRNA、gene、drug和disease # 当前版本multiMiR收录来自人类、小鼠和大鼠的 5830 个 miRNA 和 97186 个靶基因,以及 64 种药物和 223 个疾病术语 miRNAs <- list_multimir("mirna" limit <- 10) genes <- list_multimir("gene" limit <- 10) drugs <- list_multimir("drug" limit <- 10) diseases <- list_multimir("disease" limit <- 10) head(miRNAs) head(genes) head(drugs) head(diseases)

▌multiMiR应用示例

getmultimir()函数是multiMiR包的主体,首先运行 ??getmultimir 查看帮助文档。

mirna预测分析(包搞定miRNA与靶基因预测问题)(2)

关于几个参数介绍一下:

get_multimir(url = NULL org = "hsa" mirna = NULL target = NULL
disease.drug = NULL table = "validated" predicted.cutoff = NULL
predicted.cutoff.type = "p" predicted.site = "conserved"
summary = FALSE add.link = FALSE use.tibble = FALSE limit = NULL
legacy.out = FALSE)

org:表示物种来源,默认为人类。即org = "hsa" "human"或 "Homo Sapiens",其他两个物种是小鼠 ("mmu" "mouse"或"Mus musculus")和大鼠 ("rno" "rat"或"Rattus norvegicus")。

mirna:表示待检索的miRNA,默认为NULL。输入miRNA可以是miRNA accession number (i.e. "MIMAT0000072") mature miRNA ID (i.e. "hsa-miR-199a-3p") 或二者都有 (i.e. c("MIMAT0000065" "hsa-miR-30a-5p")),可以是单个miRNA或miRNA列表。

target:表示待检索靶基因,默认为NULL。输入基因可以是gene symbol (i.e. c("TP53" "KRAS")),Entrez gene ID (i.e. c(578 3845))、ensembl gene ID (i.e. "ENSG00000171791"),或三者都有 (i.e. c("TP53" 3845 "ENSG00000171791")),可以是单个基因或基因列表。

disease.drug:表示检索疾病/药物相关miRNA与靶基因关系,默认为NULL。输入参数可以是 disease(s) 和/或 drug(s) (i.e. c("bladder cancer" "cisplatin"))。

table:表示检索的数据库类型。默认validated,即实验验证的miRNA与靶基因关系数据库("mirecords" "mirtarbase" and "tarbase") 还可以是predicted,即网站预测的miRNA与靶基因关系数据库("dianamicrot" "elmmo" "microcosm" "miranda" "mirdb" "pictar" "pita" and "targetscan") 或者是disease.drug,即疾病/药物相关性miRNA与靶基因关系数据库 ( "mir2disease" "pharmacomir" and "phenomir") 若输入all表示检索以上全部数据库,或者输入某个数据库名称进行单个数据库检索。

predicted.cutoff和predicted.cutoff.type:表示数据库检索范围,默认为NULL,即若predicted.cutoff.type="p",检索范围为预测评分TOP20%,若predicted.cutoff.type="n",检索范围为预测评分TOP300000。

predicted.site:表示predicted数据库中检索保守("conserved")或非保守("nonconserved")位点,或不限制保守或非保守位点("all")。

summary:表示是否统计检索结果,默认FALSE。

1

示例一:检索给定miRNA所有经过验证的靶基因

以hsa-miR-182-5p为例,检索经实验验证的miRNA与靶基因调控关系,检索结果example1为S4格式文件,目录下data为检索结果列表。

mirna预测分析(包搞定miRNA与靶基因预测问题)(3)

## 检索给定miRNA所有经过验证的靶基因 # 以hsa-miR-182-5p为例 example1 <- get_multimir(org = "hsa" # 物种来源 hsa/mmu/rno mirna = 'hsa-miR-182-5p' table = "validated" # validated/predicted/all summary = TRUE) # 检索结果汇总 # Searching mirecords ... # Searching mirtarbase ... # Searching tarbase ... table(example1@data$type) # 查看检索结果 4057条信息 # validated # 4057 example1_result <- example1@data # 提取data数据,检索结果列表 head(example1_result) # database mature_mirna_acc mature_mirna_id target_symbol target_entrez # 1 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112 # 2 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112 # 3 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112 # 4 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112 # 5 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112 # 6 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112 # target_ensembl experiment support_type pubmed_id type # 1 ENSG00000174233 17597072 validated # 2 ENSG00000174233 17597072 validated # 3 ENSG00000174233 17597072 validated # 4 ENSG00000174233 17597072 validated # 5 ENSG00000174233 17597072 validated # 6 ENSG00000174233 17597072 validated

检索结果列表提供hsa-miR-182-5p靶基因gene symbol、entrez ID和ensembl ID,以及来源数据库、实验方法和参考文献等信息。

mirna预测分析(包搞定miRNA与靶基因预测问题)(4)

接下来统计一下检索结果,mirecords数据库来源信息12条,mirtarbase数据库来源信息260条,tarbase数据库来源信息3785条,总计不重复结果3732条。

# 统计一下检索结果 example1_sum <- example1@summary # 提取summary数据 head(example1_sum) # mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl mirecords # 1 MIMAT0000259 hsa-miR-182-5p ADCY6 112 ENSG00000174233 6 # 2 MIMAT0000259 hsa-miR-182-5p MITF 4286 ENSG00000187098 4 # 3 MIMAT0000259 hsa-miR-182-5p AMMECR1L 83607 ENSG00000144233 0 # 4 MIMAT0000259 hsa-miR-182-5p ARRDC3 57561 ENSG00000113369 0 # 5 MIMAT0000259 hsa-miR-182-5p ATF1 466 ENSG00000123268 0 # 6 MIMAT0000259 hsa-miR-182-5p ATP13A3 79572 ENSG00000133657 0 # mirtarbase tarbase validated.sum all.sum # 1 2 1 3 3 # 2 2 1 3 3 # 3 1 1 2 2 # 4 1 2 2 2 # 5 1 1 2 2 # 6 1 1 2 2 apply(example1@summary[ 6:10] 2 sum) # mirecords mirtarbase tarbase validated.sum all.sum # 12 260 3785 3732 3732

筛选出经过荧光素酶报告实验验证的结果,60个。

# 筛选经荧光素酶报告实验验证的结果 example1_Lucifer <- example1_result[grep("Luciferase" example1_result[ "experiment"]) ] head(example1_Lucifer) # database mature_mirna_acc mature_mirna_id target_symbol target_entrez # 16 mirtarbase MIMAT0000259 hsa-miR-182-5p ADCY6 112 # 17 mirtarbase MIMAT0000259 hsa-miR-182-5p ADCY6 112 # 18 mirtarbase MIMAT0000259 hsa-miR-182-5p ATF1 466 # 21 mirtarbase MIMAT0000259 hsa-miR-182-5p BARD1 580 # 22 mirtarbase MIMAT0000259 hsa-miR-182-5p BCL2 596 # 25 mirtarbase MIMAT0000259 hsa-miR-182-5p BDNF 627 # target_ensembl experiment # 16 ENSG00000174233 Luciferase reporter assay # 17 ENSG00000174233 Luciferase reporter assay//qRT-PCR # 18 ENSG00000123268 Luciferase reporter assay # 21 ENSG00000138376 Luciferase reporter assay # 22 ENSG00000171791 Flow//Luciferase reporter assay//qRT-PCR//Western blot # 25 ENSG00000176697 Luciferase reporter assay//qRT-PCR # support_type pubmed_id type # 16 Functional MTI 17597072 validated # 17 Functional MTI 20656788 validated # 18 Functional MTI 23249749 validated # 21 Functional MTI 23249749 validated # 22 Functional MTI 22848417 validated # 25 Functional MTI 25955435 validated

mirna预测分析(包搞定miRNA与靶基因预测问题)(5)

接下来,将上述3个数据库检索结果绘制Veen图,3个数据库检索结果取交集获得2个基因,即"ADCY6"和"MITF"。

# 提取3个数据检索得到的target_symbol,绘制Veen图 library(VennDiagram) library(tidyverse) db <- unique(example1_result$database) row1 <- example1_result %>% filter(database == db[1]) row2 <- example1_result %>% filter(database == db[2]) row3 <- example1_result %>% filter(database == db[3]) venn.plot <-venn.diagram( x = list(target1=unique(row1$target_symbol) target2=unique(row2$target_symbol) target3=unique(row3$target_symbol)) filename ="Venn1.tif" lty ="dotted" lwd =0.5 col ="black" fill =c("dodgerblue" "goldenrod1" "darkorange1") alpha =0.60 cat.col =c("dodgerblue" "goldenrod1" "darkorange1") cat.cex =1 cat.fontface="bold" margin =0.05 cex =1) # 获取交集 venn_list <- list(target1=unique(row1$target_symbol) target2=unique(row2$target_symbol) target3=unique(row3$target_symbol)) for (i in 1:length(venn_list)) { if(i == 1){interGenes <- venn_list[[1]]} else{interGenes <- intersect(interGenes venn_list[[i]])} } interGenes # [1] "ADCY6" "MITF"

mirna预测分析(包搞定miRNA与靶基因预测问题)(6)

2

示例二:检索与给定药物/疾病相关的 miRNA-靶基因

以检索与顺铂有关的人类miRNA-靶基因相互作用关系为例,定义isease.drug参数为顺铂"cisplatin",table 参数为'disease.drug',结果有45个。

mirna预测分析(包搞定miRNA与靶基因预测问题)(7)

# 检索与顺铂有关的人类miRNA-靶基因相互作用关系 example2 <- get_multimir(org = "hsa" disease.drug = 'cisplatin' # 顺铂 table = 'disease.drug' summary = TRUE) # Searching mir2disease ... # Searching pharmaco_mir ... # Searching phenomir ... table(example2@data$type) # 查看检索结果 # disease.drug # 45 example2_result <- example2@data head(example2_result) # database mature_mirna_acc mature_mirna_id target_symbol target_entrez # 1 pharmaco_mir MIMAT0000772 hsa-miR-345-5p ABCC1 4363 # 2 pharmaco_mir MIMAT0000720 hsa-miR-376c-3p ALK7 # 3 pharmaco_mir MIMAT0000423 hsa-miR-125b-5p BAK1 578 # 4 pharmaco_mir hsa-miR-34 BCL2 596 # 5 pharmaco_mir MIMAT0000318 hsa-miR-200b-3p BCL2 596 # 6 pharmaco_mir MIMAT0000617 hsa-miR-200c-3p BCL2 596 # target_ensembl disease_drug paper_pubmedID type # 1 ENSG00000103222 cisplatin 20099276 disease.drug # 2 cisplatin 21224400 disease.drug # 3 ENSG00000030110 cisplatin 21823019 disease.drug # 4 ENSG00000171791 cisplatin 18803879 disease.drug # 5 ENSG00000171791 cisplatin 21993663 disease.drug # 6 ENSG00000171791 cisplatin 21993663 disease.drug

3

示例三:检索靶向调节给定基因的miRNA

以检索网站预测的小鼠中靶向调节Gnb1基因的miRNA为例,检索范围设置为预测评分TOP 35%,检索结果绘制upset展示。

# 检索网站预测的小鼠中靶向调节Gnb1基因的miRNA example3 <- get_multimir(org = "mmu" # 物种来源 hsa/mmu/rno target = "Gnb1" # 目标靶基因 table = "predicted" # 预测的miRNA-target summary = TRUE predicted.cutoff = 35 # 检索范围为预测评分TOP 35% predicted.cutoff.type = "p" # 检索范围为预测评分TOP 35% predicted.site = "all") # conserved/nonconserved/all # Searching diana_microt ... # Searching elmmo ... # Searching microcosm ... # Searching miranda ... # Searching mirdb ... # Searching pictar ... # Searching pita ... # Searching targetscan ... table(example3@data$type) # 查看检索结果 # predicted # 716 example3_result <- example3@data head(example3_result) # database mature_mirna_acc mature_mirna_id target_symbol target_entrez # 1 diana_microt MIMAT0000663 mmu-miR-218-5p Gnb1 14688 # 2 diana_microt MIMAT0017276 mmu-miR-493-5p Gnb1 14688 # 3 diana_microt MIMAT0000656 mmu-miR-139-5p Gnb1 14688 # 4 diana_microt MIMAT0014946 mmu-miR-3074-2-3p Gnb1 14688 # 5 diana_microt MIMAT0000144 mmu-miR-132-3p Gnb1 14688 # 6 diana_microt MIMAT0020608 mmu-miR-5101 Gnb1 14688 # target_ensembl score type # 1 ENSMUSG00000029064 0.975 predicted # 2 ENSMUSG00000029064 0.964 predicted # 3 ENSMUSG00000029064 0.96 predicted # 4 ENSMUSG00000029064 0.921 predicted # 5 ENSMUSG00000029064 0.92 predicted # 6 ENSMUSG00000029064 0.918 predicted example3_sum <- example3@summary # 统计检索结果 head(example3_sum) apply(example3@summary[ 6:13] 2 sum) # diana_microt elmmo microcosm miranda mirdb pictar # 105 108 5 49 59 18 # pita targetscan # 175 197

获得8个数据库预测结果,接下来绘制upset图,首先准备upset绘图文件,如下图所示。

mirna预测分析(包搞定miRNA与靶基因预测问题)(8)

# 准备upset绘图文件 db2 <- unique(example3_result$database) row1 <- example3_result %>% filter(database == db2[1]) row2 <- example3_result %>% filter(database == db2[2]) row3 <- example3_result %>% filter(database == db2[3]) row4 <- example3_result %>% filter(database == db2[4]) row5 <- example3_result %>% filter(database == db2[5]) row6 <- example3_result %>% filter(database == db2[6]) row7 <- example3_result %>% filter(database == db2[7]) row8 <- example3_result %>% filter(database == db2[8]) data3 <- cbind(unique(row1$mature_mirna_id) unique(row2$mature_mirna_id) unique(row3$mature_mirna_id) unique(row4$mature_mirna_id) unique(row5$mature_mirna_id) unique(row6$mature_mirna_id) unique(row7$mature_mirna_id) unique(row8$mature_mirna_id)) %>% as.data.frame() colnames(data3) <- db2 # 构建函数 Upsetdata <- function(data){ require(tidyverse) Genes <- data %>% do.call(c .) %>% unique() upset0 = matrix(NA nrow = length(Genes) ncol = length(data))%>% as.data.frame() %>% `rownames<-`(Genes) %>% rownames_to_column("Gene") colnames(upset0)<-c("Gene" names(data)) for (i in 2:c(length(data) 1)) { upset0[match(data[[i-1]] upset0[["Gene"]]) i] = 1 upset0[is.na(upset0[[i]]) i] <- 0 } return(upset0) } upset_data <- Upsetdata(data3)

接下来绘制upset图,需调用UpSetR包。

# 绘制upset图 library(UpSetR) upset(upset_data nsets =13 # 绘制全部13个基因集 默认绘制前5个基因集 keep.order=TRUE # 保持输入文件列表中基因顺序 point.size =2 matrix.color=c("#BC80BD") # 点线图颜色 line.size =0.3 matrix.dot.alpha = 0.5 # 非交集点透明度 shade.color = "blue" #背景颜色 shade.alpha = 0.1) #背景透明度

mirna预测分析(包搞定miRNA与靶基因预测问题)(9)

4

示例四:批量检索目标miRNA靶基因或靶向目标基因的miRNA

以示例数据进行演示,示例数据DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因,加载方式为load(url("http://multimir.org/bladder.rda"))。

# 加载示例数据 # DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因 load(url("http://multimir.org/bladder.rda")) View(DE.gene.info) View(DE.miRNA.info)

mirna预测分析(包搞定miRNA与靶基因预测问题)(10)

mirna预测分析(包搞定miRNA与靶基因预测问题)(11)

以已知miRNA列表预测靶基因为例,结果14个数据库预测结果中,实验验证的有24806个,网站预测的有37331个,疾病/药物相关的有451个。

# 已知miRNA列表预测靶基因 example4 <- get_multimir(org = "hsa" mirna = DE.miRNA.up table = "all" # 检索14个数据库 summary = TRUE predicted.cutoff.type = "p" predicted.cutoff = 10 use.tibble = TRUE) # Searching mirecords ... # Searching mirtarbase ... # Searching tarbase ... # Searching diana_microt ... # Searching elmmo ... # Searching microcosm ... # Searching miranda ... # Searching mirdb ... # Searching pictar ... # Searching pita ... # Searching targetscan ... # Searching pharmaco_mir ... # Joining by = c("database" "mature_mirna_acc" "mature_mirna_id" "target_symbol" "target_entrez" "target_ensembl" "type") # Joining by = c("database" "mature_mirna_acc" "mature_mirna_id" "target_symbol" "target_entrez" "target_ensembl" "type") save(example4 file = "example4.Rdata") # 时间较长,建议及时保存检索结果 load(file = "example4.Rdata") table(example4@data$type) # disease.drug predicted validated # 451 37331 24806 example4_result <- example4@data head(example4_result) example4_sum <- example4@summary head(example4_sum) apply(example4@summary[ 6:23] 2 sum) # diana_microt elmmo microcosm mir2disease miranda mirdb # 9015 18378 696 114 4048 2722 # mirecords mirtarbase pharmaco_mir phenomir pictar pita # 115 4175 9 328 25 1656 # tarbase targetscan disease.sum predicted.sum validated.sum all.sum # 20516 791 22 21626 22970 45847 # 已知mRNA列表检索miRNA example4_2 <- get_multimir(org = "hsa" target = DE.entrez.dn table = "all" # 检索14个数据库 summary = TRUE predicted.cutoff.type = "p" predicted.cutoff = 10 use.tibble = TRUE) save(example4_2 file = "example4_2.Rdata") # 时间较长,建议及时保存检索结果 load(file = "example4_2.Rdata") table(example4@data$type) example4_result2 <- example4@data head(example4_result2) example4_sum2 <- example4@summary head(example4_sum2) apply(example4@summary[ 6:23] 2 sum)

2

示例五:检索一组 miRNA 和一组基因之间的相互作用

以示例数据进行演示,示例数据DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因,加载方式为load(url("http://multimir.org/bladder.rda"))。结果显示,网站预测、实验验证和疾病/药物相关的miRNA与靶基因调节关系有160个、98个和442个。

# 加载示例数据 load(url("http://multimir.org/bladder.rda")) # 数据库检索 example5 <- get_multimir(org = "hsa" mirna = DE.miRNA.up target = DE.entrez.dn table = "all" summary = TRUE predicted.cutoff.type = "p" predicted.cutoff = 10 use.tibble = TRUE) save(example5 file = "example5.Rdata") load(file = "example5.Rdata") table(example5@data$type) # disease.drug predicted validated # 442 160 98 example5_result <- example5@data head(example5_result) example5_sum <- example5@summary head(example5_sum) apply(example5@summary[ 6:19] 2 sum) # diana_microt elmmo mir2disease miranda mirdb mirtarbase # 27 92 114 13 11 8 # phenomir pita tarbase targetscan disease.sum predicted.sum # 328 14 90 3 16 88 # validated.sum all.sum # 91 198

接下来,筛选出实验验证的检索结果,有85个miRNA与靶基因调节关系,其中7个miRNA和41个mRNA,可按照不同数据库展示检索结果。

# 筛选实验验证的结果 result <- select(example5 keytype = "type" keys = "validated" columns = columns(example5)) # 去重 unique_pairs <- result[!duplicated(result[ c("mature_mirna_id" "target_entrez")]) ] # 去重方法一 unique_pairs_2 <- unique(data.frame(miRNA.ID = as.character(result$mature_mirna_id) target.Entrez = as.character(result$target_entrez))) # 去重方法二 nrow(unique_pairs) # 85个miRNA-target nrow(unique_pairs_2) # 85个miRNA-target unique_miRNA <- unique(result$mature_mirna_id) length(unique_miRNA) # 7个miRNA unique_target <- unique(result$target_symbol) length(unique_target) # 41个mRNA # 按照不同数据库展示检索结果 example5_split <- split(result result$database) # 按照数据库分类 example5_split$mirtarbase # # A tibble: 8 x 13 # database mature_mirna_acc mature_mirna_id target_symbol target_entrez # <chr> <chr> <chr> <chr> <chr> # 1 mirtarbase MIMAT0000087 hsa-miR-30a-5p FDX1 2230 # 2 mirtarbase MIMAT0000259 hsa-miR-182-5p CUL5 8065 # 3 mirtarbase MIMAT0000418 hsa-miR-23b-3p RRAS2 22800 # 4 mirtarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998 # 5 mirtarbase MIMAT0000418 hsa-miR-23b-3p SWAP70 23075 # 6 mirtarbase MIMAT0000087 hsa-miR-30a-5p PEG10 23089 # 7 mirtarbase MIMAT0000245 hsa-miR-30d-5p PEG10 23089 # 8 mirtarbase MIMAT0000420 hsa-miR-30b-5p PEG10 23089 # # ... with 8 more variables: target_ensembl <chr> experiment <chr> # # support_type <chr> pubmed_id <chr> type <chr> score <chr> # # disease_drug <chr> paper_pubmedID <chr> example5_split$tarbase # # A tibble: 90 x 13 # database mature_mirna_acc mature_mirna_id target_symbol target_entrez # <chr> <chr> <chr> <chr> <chr> # 1 tarbase MIMAT0000418 hsa-miR-23b-3p LIMA1 51474 # 2 tarbase MIMAT0000449 hsa-miR-146a-5p LIMA1 51474 # 3 tarbase MIMAT0000259 hsa-miR-182-5p FOXN3 1112 # 4 tarbase MIMAT0000449 hsa-miR-146a-5p FOXN3 1112 # 5 tarbase MIMAT0000418 hsa-miR-23b-3p BCAT1 586 # 6 tarbase MIMAT0000449 hsa-miR-146a-5p BCAT1 586 # 7 tarbase MIMAT0000080 hsa-miR-24-3p BCAT1 586 # 8 tarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998 # 9 tarbase MIMAT0000259 hsa-miR-182-5p LIMCH1 22998 # 10 tarbase MIMAT0000259 hsa-miR-182-5p IPO5 3843 # # ... with 80 more rows and 8 more variables: target_ensembl <chr> # # experiment <chr> support_type <chr> pubmed_id <chr> type <chr> # # score <chr> disease_drug <chr> paper_pubmedID <chr>

我们在筛选一个与膀胱癌相关的检索结果,即hsa-miR-23b-3p和hsa-miR-146a-5p。再检索一下实验验证的靶基因,以hsa-miR-23b-3p为例,结果以Veen图展示,靶基因有3个,即"NOTCH1","PLAU"和"MET"。

# 筛选膀胱癌相关结果 mykeys <- keys(example5 keytype = "disease_drug") mykeys <- mykeys[grep("bladder" mykeys ignore.case = TRUE)] result2 <- select(example5 keytype = "disease_drug" keys = mykeys columns = columns(example5)) unique_pairs2 <- result2[!duplicated(result[ c("mature_mirna_id" "target_entrez")]) ] nrow(unique_pairs2) # 2个miRNA-target # [1] 2 unique_pairs2 # # A tibble: 2 x 13 # database mature_mirna_acc mature_mirna_id target_symbol target_entrez # <chr> <chr> <chr> <chr> <chr> # 1 mir2disease MIMAT0000418 hsa-miR-23b-3p NA NA # 2 phenomir MIMAT0000449 hsa-miR-146a-5p NA NA # # ... with 8 more variables: target_ensembl <chr> experiment <chr> # # support_type <chr> pubmed_id <chr> type <chr> score <chr> # # disease_drug <chr> paper_pubmedID <chr> # 以hsa-miR-23b-3p为例检索实验验证的靶基因 example5_2 <- get_multimir(org = "hsa" mirna = 'hsa-miR-23b-3p' table = "validated" summary = TRUE) apply(example5_2@summary[ 6:10] 2 sum) # mirecords mirtarbase tarbase validated.sum all.sum # 7 433 3660 3750 3750 example1_Lucifer <- example5_2@data[grep("Luciferase" example1_result[ "experiment"]) ] nrow(example1_Lucifer) # Luciferase实验验证结果4条 # [1] 42 # 绘制Veen图 library(VennDiagram) library(tidyverse) db <- unique(example5_2@data$database) row1 <- example5_2@data %>% filter(database == db[1]) row2 <- example5_2@data %>% filter(database == db[2]) row3 <- example5_2@data %>% filter(database == db[3]) venn.plot <-venn.diagram( x = list(target1=unique(row1$target_symbol) target2=unique(row2$target_symbol) target3=unique(row3$target_symbol)) filename ="Venn1.tif" lty ="dotted" lwd =0.5 col ="black" fill =c("dodgerblue" "goldenrod1" "darkorange1") alpha =0.60 cat.col =c("dodgerblue" "goldenrod1" "darkorange1") cat.cex =1 cat.fontface="bold" margin =0.05 cex =1) # 获取交集 venn_list <- list(target1=unique(row1$target_symbol) target2=unique(row2$target_symbol) target3=unique(row3$target_symbol)) for (i in 1:length(venn_list)) { if(i == 1){interGenes <- venn_list[[1]]} else{interGenes <- intersect(interGenes venn_list[[i]])} } interGenes [1] "NOTCH1" "PLAU" "MET"

mirna预测分析(包搞定miRNA与靶基因预测问题)(12)

以上就是multiMiR包全部内容,开发并维护R包不易,小伙伴们使用时别忘记引用以下文献哦~!

Ru Y Kechris KJ Tabakoff B Hoffman P Radcliffe RA Bowler R Mahaffey S Rossi S Calin GA Bemis L Theodorescu D. The multiMiR R package and database: integration of microRNA-target interactions along with their disease and drug associations. Nucleic Acids Res. 2014;42(17):e133. doi: 10.1093/nar/gku631. Epub 2014 Jul 24. PMID: 25063298; PMCID: PMC4176155.

mirna预测分析(包搞定miRNA与靶基因预测问题)(13)

猜您喜欢: