mongoDB를 사용한 데이터의 Pagination 효율성 개선법

현재 best 게시물을 가져오는 로직입니다. page를 분할하기 위한 skip, limit를 사용했습니다.

apiRouter.post("/post/request/best", (req, res) => {
  const { page } = req.body;

  Post.find({ liked: { $gte: BestHuddle } }, null, {
    skip: (page - 1) * 20,
    limit: 20,
  })
    .sort({ createdAt: -1 })
    .populate("author")
    .populate("comments")
    .exec(function (err, post) {
      if (err) {
        console.log(err);
        return res.json({ postFound: false, err: err.message });
      }
      return res.json({ postFound: true, posts: post });
    });
});

이러한 방법의 문제가 없다고 생각했는데 skip, limit를 사용하는 방식의 pagination에 대해서는 stackoverflow에서 다음과 같은 문제점을 지적했습니다.

this is not efficient if there are huge documents in the collection. Suppose you have 100M documents, and you want to get the data from the middle offset(50Mth). MongoDB has to build up the full dataset and walk from the beginning to the specified offset, this will be low performance. As your offset increases, the performance keeps degrade.

즉, 데이터 전부를 빌드업한 후에 skip하는 방식으로 이루어지기 때문에 성능 상의 문제가 있다는군요. 도대체 얼마나 걸리는지에 대해서는 다음 댓글이 있었습니다.

Actually with 1M records, I see one find(Criteria).skip().limit().sort() and one find(Criteria).count() will return me correct total number and pagination in 5s(1st query)/3s(2nd query) with arbitrary page number. Not very very good, but no other choices.

으음...5초면 솔직히 상당히 느린 편입니다. 누가 페이지 이동하는데 5초나 기다려주나요

해결법 1) mongoose-aggregate-paginate-v2 사용하기

이렇게 패키지에 계속 의존하는 것은 좋지 않은 버릇이므로 이번에는 사용하지 않기로 했습니다.

게다가 mongoose가 아니고 쉘을 사용해야 하는 경우도 있을텐데 이렇게 넘어가는 건 좋지 않아 보입니다.

kb.objectrocket.com/mongo-db/mongoose-pagination-with-nodejs-and-mongodb-1304

mongoose Pagination with NodeJS and MongoDB | ObjectRocket

Pagination is a process in which a document is divided into discrete pages. In simple words, if we have a document containing a large amount of data, and we wish not to print all of it in once, we divide this data into “pages”. These pages contain data

kb.objectrocket.com

해결법 2) Fast Paging

다음과 같은 방법으로 Pagination을 구현하는 방법이 더 효율적인 방법이라고 합니다. skip을 사용하지 않았으며, 자동으로 부여되는 _id를 활용했습니다.

//Page 1
db.users.find().limit(pageSize);
//Find the id of the last document in this page
last_id = ...

//Page 2
users = db.users.find({'_id'> last_id}). limit(10);

쉽게말해 페이지를 가져오고, 해당 페이지의 마지막으로 발견된 _id를 저장하여, 다음 페이지를 가져올 때 해당 _id 보다 큰 _id를 가진 도큐먼트를 가져오는 것입니다.

_id에 비교 연산자를 사용할 수 있는 이유는 _id가 단순 string이 아니기 때문입니다.

_id ("542c2b97 bac059 5474 108b48")과 같은 _id가 존재할 경우

542c2b97 는 유닉스 시간을 의미합니다. _id.getTimeStamp() 메서드를 통해 ISO 값을 얻어올 수 있습니다.

(관련 내용은 공식 문서 docs.mongodb.com/manual/reference/method/ObjectId.getTimestamp/ 를 참고)

bac059 는 기기 id입니다.

5474 는 프로세스 id입니다.

108b48 는 카운터입니다.

참고한 글)

Fast and Efficient Pagination in MongoDB | Codementor

MongoDB is a document based data store and hence pagination is one of the most common use case of it. Find out how you can paginate the results.

www.codementor.io

What is the best way for pagination on mongodb using java

I am trying to create a simple pagination in mongodb by below code. collection.find().skip(n).limit(n); but doesn't it looks like there will be a performance issue if we see that in java terms fi...

stackoverflow.com

Implementing pagination in mongodb

I know that it is a bad practice to use skip in order to implement pagination, because when your data gets large skip starts to consume a lot of memory. One way to overcome this trouble is to use n...

stackoverflow.com

저작자표시 (새창열림)

'DB, ORM > 🍃mongoose, pymongo' 카테고리의 다른 글

text 인덱스를 활용한 mongoDB 검색 기능 구현 (0)	2020.08.11
mongoose populate를 통해 관계 Objectid를 통한 실제 객체를 추출하기 (0)	2020.07.25
mongoose Schema option 이용하기 (0)	2020.07.23
모델과의 관계 설정 (임베디드 방식/레퍼런스 방식) (0)	2020.06.17
mongodb 기본 메서드 외 다른 살펴보기 (0)	2020.06.10

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

darren, dev blog