快捷搜索:  汽车  科技

mongodb数据分页查询(亿级别记录的mongodb分页查询java代码实现)

mongodb数据分页查询(亿级别记录的mongodb分页查询java代码实现)传统的limit方式当数据量较大时查询缓慢,不太适用。考虑别的方式,参考了logstash-input-mongodb的思路: public static void main(String[] args) { try { /**** Connect to MongoDB ****/ // Since 2.10.0 uses MongoClient MongoClient mongo = new MongoClient("localhost" 27017); /**** Get database ****/ // if database doesn't exists MongoDB will create it for you DB db = mongo.getDB("www"); /**** Get collection / table from 'testdb' *

1.准备环境

1.1 mongodb下载

1.2 mongodb启动

C:\mongodb\bin\mongod --dbpath D:\mongodb\data

1.3 可视化mongo工具Robo 3T下载

2.准备数据

<dependency> <groupId>org.mongodb</groupId> <artifactId>mongo-java-driver</artifactId> <version>3.6.1</version> </dependency>

java代码执行

public static void main(String[] args) { try { /**** Connect to MongoDB ****/ // Since 2.10.0 uses MongoClient MongoClient mongo = new MongoClient("localhost" 27017); /**** Get database ****/ // if database doesn't exists MongoDB will create it for you DB db = mongo.getDB("www"); /**** Get collection / table from 'testdb' ****/ // if collection doesn't exists MongoDB will create it for you DBCollection table = db.getCollection("person"); /**** Insert ****/ // create a document to store key and value BasicDBObject document=null; for(int i=0;i<100000000;i ) { document = new BasicDBObject(); document.put("name" "mkyong" i); document.put("age" 30); document.put("sex" "f"); table.insert(document); } /**** Done ****/ System.out.println("Done"); } catch (UnknownHostException e) { e.printStackTrace(); } catch (MongoException e) { e.printStackTrace(); } }

3.分页查询

传统的limit方式当数据量较大时查询缓慢,不太适用。考虑别的方式,参考了logstash-input-mongodb的思路:

public def get_cursor_for_collection(mongodb mongo_collection_name last_id_object batch_size) collection = mongodb.collection(mongo_collection_name) # Need to make this sort by date in object id then get the first of the series # db.events_20150320.find().limit(1).sort({ts:1}) return collection.find({:_id => {:$gt => last_id_object}}).limit(batch_size) end collection_name = collection[:name] @logger.debug("collection_data is: #{@collection_data}") last_id = @collection_data[index][:last_id] #@logger.debug("last_id is #{last_id}" :index => index :collection => collection_name) # get batch of events starting at the last_place if it is set last_id_object = last_id if since_type == 'id' last_id_object = BSON::ObjectId(last_id) elsif since_type == 'time' if last_id != '' last_id_object = Time.at(last_id) end end cursor = get_cursor_for_collection(@mongodb collection_name last_id_object batch_size)

使用java实现

import java.net.UnknownHostException; import java.util.List; import org.bson.types.ObjectId; import com.mongodb.BasicDBObject; import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.DBCursor; import com.mongodb.DBObject; import com.mongodb.MongoClient; import com.mongodb.MongoException; public class Test { public static void main(String[] args) { int pageSize=50000; try { /**** Connect to MongoDB ****/ // Since 2.10.0 uses MongoClient MongoClient mongo = new MongoClient("localhost" 27017); /**** Get database ****/ // if database doesn't exists MongoDB will create it for you DB db = mongo.getDB("www"); /**** Get collection / table from 'testdb' ****/ // if collection doesn't exists MongoDB will create it for you DBCollection table = db.getCollection("person"); DBCursor dbObjects; Long cnt=table.count(); //System.out.println(table.getStats()); Long page=getPageSize(cnt pageSize); ObjectId lastIdObject=new ObjectId("5bda8f66ef2ed979bab041aa"); for(Long i=0L;i<page;i ) { Long start=System.currentTimeMillis(); dbObjects=getCursorForCollection(table lastIdObject pageSize); System.out.println("第" (i 1) "次查询,耗时:" (System.currentTimeMillis()-start)/1000 "秒"); List<DBObject> objs=dbObjects.toArray(); lastIdObject=(ObjectId) objs.get(objs.size()-1).get("_id"); } } catch (UnknownHostException e) { e.printStackTrace(); } catch (MongoException e) { e.printStackTrace(); } } public static DBCursor getCursorForCollection(DBCollection collection ObjectId lastIdObject int pageSize) { DBCursor dbObjects=null; if(lastIdObject==null) { lastIdObject=(ObjectId) collection.findOne().get("_id"); //TODO 排序sort取第一个,否则可能丢失数据 } BasicDBObject query=new BasicDBObject(); query.append("_id" new BasicDBObject("$gt" lastIdObject)); BasicDBObject sort=new BasicDBObject(); sort.append("_id" 1); dbObjects=collection.find(query).limit(pageSize).sort(sort); return dbObjects; } public static Long getPageSize(Long cnt int pageSize) { return cnt%pageSize==0?cnt/pageSize:cnt/pageSize 1; } }

4.一些经验教训

1. 不小心漏打了一个$符号,导致查询不到数据,浪费了一些时间去查找原因

query.append("_id" new BasicDBObject("$gt" lastIdObject)); 2.创建索引   创建普通的单列索引:db.collection.ensureIndex({field:1/-1}); 1是升续 -1是降续 实例:db.articles.ensureIndex({title:1}) //注意 field 不要加""双引号,否则创建不成功   查看当前索引状态: db.collection.getIndexes();   实例:   db.articles.getIndexes();   删除单个索引db.collection.dropIndex({filed:1/-1});

3.执行计划

db.student.find({"name":"dd1"}).explain()

mongodb数据分页查询(亿级别记录的mongodb分页查询java代码实现)(1)

参考文献:

【1】https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb

【2】https://www.cnblogs.com/yxlblogs/p/4930308.html

【3】https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/

猜您喜欢: