집계 명령어 활용하기 (3) Aggregate 파이프라인 (i) 👨‍🏫 기본적인 stage

Aggregation Pipeline — MongoDB Manual

Aggregation > Aggregation Pipeline Aggregation Pipeline The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated r

docs.mongodb.com

파이프라인이란 한 처리 단계의 출력이 다음 단계의 입력으로 이어지는 구조를 말한다. gulp에서도 파이프라인을 사용해보았을 텐데, mongoDB에서도 비슷한 의미로 사용된다.

👨‍🏫 기본적인 stage

aggregate의 구조를 보면 위에서부터 차례대로 stage를 실행하는 것을 볼 수 있다. 이러한 파이프라인을 통과해서 원하는 정보를 가공하는 것이다. (솔직히 map/reduce 방식보다 더 쉽고, 성능도 더 좋다.)

db.collection.aggregate([
   { $match: { status: "A" } }, // match stage
   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } } // group stage
])

사용할 수 있는 stage는 다음 공식 문서에 정리되어 있다. 전부는 아니더라도 자주 쓰이고 유용한 것들을 살펴보자.

https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

$project

Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document.

// 기존의 필드 표기
{$project:  {<field>: <boolean>}}

// 새로운 필드 추가
{$project:  {<field>: <expression>}}

// _id를 숨기고, rating만 표기하라
db.rating.aggregate([
    {$project: {_id: 0, rating: 1}}
])

// test라는 이름의 새로운 필드를 생성하라
db.rating.aggregate([
    {$project: {_id: 0, rating: 1, test: "hell world"}}
])

// multiply라는 필드를 만들어라
db.rating.aggregate([
    {$project: {_id: 0, multiply: {
            $multiply: ["$_id", "$user_id"]
     }}}
])

$group

https://docs.mongodb.com/manual/reference/operator/aggregation/group/#pipe._S_group

Groups input documents by a specified identifier expression and applies the accumulator expression(s), if specified, to each group. Consumes all input documents and outputs one document per each distinct group. The output documents only contain the identifier field and, if specified, accumulated fields.

{
  $group:
    {
      _id: <expression>, // Group By Expression
      <field1>: { <accumulator1> : <expression1> },
      <field2>: { <accumulator2> : <expression2> },
      ...
    }
 }

$group 내에서 사용할 수 있는 연산자로는

$first, $last, $max, $min, $avg, $sum,

$push, $addToSet (둘 다 배열을 반환하지만 push는 중복 허용, addToSet은 중복 제거[set 자료구조니 중복이 없다])

// rating을 기준으로 그루핑하되 도큐먼트당 1씩을 더한 값을 count필드로 만들어라.
db.rating.aggregate([
    {$group: {_id: "$rating", count: {$sum: 1}}}
])

// rating을 기준으로 그루핑하되 그룹 내 도큐먼트 중 user_id 최대 값을 test필드로 만들어라
db.rating.aggregate([
    {$group: {_id: "$rating", test: {$max: "$user_id"}}}
])

// rating을 기준으로 그루핑하되 그룹 내 도큐먼트 중 user_id 필드를 배열로 반환하라(중복 허용)
db.rating.aggregate([
    {$group: {_id: "$rating", test: {$push: "$user_id"}}}
])

$match

Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.

find 명령어와 비슷한 기능을 한다.

{ $match: { $expr: { <aggregation expression> } } }

// rating이 4 이상인 것을 먼저 걸러내고, rating 기준으로 user_id를 user_ids라는 이름의 배열로 만들어라
db.rating.aggregate([
    {$match: {rating: {$gte: 4}}},
    {$group: {_id: "$rating", user_ids: {$push: "$user_id"}}}
])

$unwind

Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.

쉽게 말해 배열을 분리하여 각각의 도큐먼트로 분리한다.

{
  $unwind:
    {
      path: <field path>,
      includeArrayIndex: <string>, // 분리할 때 인덱스 필드를 생성하고 싶으면 인덱스 이름을 지정하자
      preserveNullAndEmptyArrays: <boolean> // true면 인 요소도 도큐먼트로 생성(default: false)
    }
}

db.rating.aggregate([
    {$match: {rating: {$gte: 4}}},
    {$group: {_id: "$rating", user_ids: {$push: "$user_id"}}},
    {$unwind: "$user_ids"}
])

// $unwind를 적용하지 않은 경우
{
    "_id" : 5.0,
    "user_ids" : [ 
        6.0, 
        9.0, 
        10.0, 
        5.0
    ]
}

// $unwind를 적용한 경우
{"_id" : 5.0,"user_ids" : 5.0}
{"_id" : 5.0,"user_ids" : 6.0}
{"_id" : 5.0,"user_ids" : 9.0}
{"_id" : 5.0,"user_ids" : 10.0}

$out

https://docs.mongodb.com/manual/reference/operator/aggregation/out/

Takes the documents returned by the aggregation pipeline and writes them to a specified collection. The $out operator must be the last stage in the pipeline. The $out operator lets the aggregation framework return result sets of any size.

aggregate 결과를 새로운 컬렉션으로 만들어 반환합니다.

이미 같은 이름으로 존재하는 컬렉션이 있으면 덮어씁니다.

{ $out: "<output-collection>" }

db.rating.aggregate([
    {$match: {rating: {$gte: 4}}},
    {$group: {_id: "$rating", user_ids: {$push: "$user_id"}}},
    {$out: "user_ids_by_rating"}
])
    
db.user_ids_by_rating.find()

👨‍🏫 입력 제어 스테이지

$limit

입력된 숫자만큼만 출력합니다.

{ $limit: <positive integer> }

$skip

입력된 숫자만큼 생략합니다.

{ $skip: <positive integer> }

$sort

1이면 오름차순, -1이면 내림차순

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

이를 적용한 예는 다음ㄱ과 같다.

// user_id로 내림차순을 하고, 위에서부터 5개만 추린 뒤 1개를 제외하고 출력하라
db.rating.aggregate([
    {$sort: {user_id: -1}},
    {$limit: 5},
    {$skip: 1}
])]

// 총 4개가 출력됩니다.

👨‍🏫 예시

위에서 언급한 스테이지를 아래와 같이 사용할 수 있겠죠

db.local.aggregate([
    {$match: {sub_category: "의회비"}},
    {$group: {
            _id: "$city_or_province", expense_avg: {$avg: "$this_term_expense"}
    }},
    {$sort: {expense_avg: -1}}
])

저작자표시

'DB, ORM > 🍃 mongoDB (shell)' 카테고리의 다른 글

인덱스 (1) 인덱싱의 이해, 종류, B-Tree 자료구조 (0)	2020.07.25
집계 명령어 활용하기 (3) Aggregate 파이프라인 (ii) 👨‍🏫 고급(?) stage (0)	2020.07.25
집계 명령어 활용하기 (1) 몽고 DB의 아키텍쳐와 집계의 효율성 (0)	2020.07.24
projection 연산자 (0)	2020.07.24
쿼리 작성하기 (논리/비교 연산자, 문자열 연산자, 배열 연산자) (0)	2020.07.24

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

darren, dev blog