여행 전체 조회 쿼리 개선 #498

Combi153 · 2023-10-16T11:10:49Z

Combi153
Oct 16, 2023
Maintainer

쿼리 두 번 날리기(서브쿼리)

Trip과 Point, Trip과 Post 는 모두 1대다 관계입니다. 따라서 Join을 했을 때 중복 데이터가 생성됩니다.

데이터 중복을 피하기 위해 Select 쿼리에 Distinct 키워드를 추가했습니다. 그 쿼리의 예시는 다음과 같습니다.

SELECT DISTINCT t1_0.trip_id,
                t1_0.created_at,
                t1_0.image_url,
                t1_0.member_id,
                t1_0.name,
                t1_0.route_image_url,
                t1_0.status,
                t1_0.updated_at
FROM trip t1_0
         JOIN post p1_0 ON t1_0.trip_id = p1_0.trip_id
         JOIN point p2_0 ON p1_0.point_id = p2_0.point_id
ORDER BY t1_0.trip_id DESC LIMIT ?;

Distinct 키워드로 더이상 데이터가 중복되지 않게 되었습니다. 다만, 데이터의 양이 많은 환경에서 Distinct 를 포함한 쿼리가 매우 느리다는 것을 발견했습니다.

아무런 조건을 설정하지 않은 전체 여행 조회의 경우 약 0.6ms 시간이 소모됩니다. 테스트 환경은 약 10만 건의 데이터가 존재하는 Dev db에 연결된 api 입니다.

이를 해결하기 위해 Subquery를 도입하려 했습니다. Subquery 에서 중복을 제거한 후 in 절을 활용해 Trip 을 조회하는 방식입니다.

하지만 QueryDsl, JpaCriteria 등 동적쿼리를 생성하는 라이브러리에서, Subquery 내부의 limit 기능을 제공하지 않습니다.

따라서 Subquery가 아니라, 쿼리를 두 번 날리도록 설정했습니다.

select distinct p1_0.trip_id
from point p1_0
where p1_0.has_post = ?
order by p1_0.trip_id desc limit ?

이것이 첫 번째 쿼리입니다.

select t1_0.trip_id,
       t1_0.created_at,
       t1_0.image_url,
       t1_0.member_id,
       t1_0.name,
       t1_0.route_image_url,
       t1_0.status,
       t1_0.updated_at
from trip t1_0
where t1_0.trip_id in (?, ?, ?)
order by t1_0.trip_id desc

이것이 두 번째 쿼리입니다.

두 번째 쿼리의 in 절에 첫 번째 쿼리의 실행 결과가 들어갑니다.

쿼리를 두 번 실행했을 때의 성능과 제한적인 서브쿼리 사용으로 한 번 실행했을 때의 성능을 비교했을 때 두 번 실행하는 쿼리의 성능이 2배 이상 빨랐습니다. 따라서 두 번 쿼리를 실행하는 선택을 했습니다.

조회 조건에 따른 Table Join

한편, 여행 전체 조회 시 조건에 따라 필요한 테이블과 컬럼이 달라집니다.

따라서 조회 조건별로 분기를 나누고, 필요한 테이블만 다루려고 했습니다.

조회 조건별로 필요한 데이터는 다음과 같습니다.

조건이 없는 경우 : point.has_post
시간 조건만 있는 경우 : point.recorded_at, point.has_post
주소 조건만 있는 경우 : post.address
모든 조건이 있는 경우 : point.recorded_at, post.address

조건이 없는 경우

조건이 없는 경우, 여행 전체를 조회하기 위해 필요한 데이터는 Point 테이블의 has_post 컬럼입니다. 따라서 Trip과 Point 테이블만 작업합니다.

select distinct p1_0.trip_id
from point p1_0
where p1_0.has_post = ?
order by p1_0.trip_id desc limit ?

select t1_0.trip_id,
       t1_0.created_at,
       t1_0.image_url,
       t1_0.member_id,
       t1_0.name,
       t1_0.route_image_url,
       t1_0.status,
       t1_0.updated_at
from trip t1_0
where t1_0.trip_id in (?, ?, ?)
order by t1_0.trip_id desc

시간 조건만 있는 경우

시간 조건만 있는 경우, 여행 전체를 조회하기 위해 필요한 데이터는 Point 테이블의 recorded_at, has_post 컬럼입니다. 따라서 Trip과 Point 테이블만 작업합니다.

select distinct p1_0.trip_id
from point p1_0
where dayofweek(p1_0.recorded_at) = ?
  and p1_0.has_post = ?
order by p1_0.trip_id desc limit ?

select t1_0.trip_id,
       t1_0.created_at,
       t1_0.image_url,
       t1_0.member_id,
       t1_0.name,
       t1_0.route_image_url,
       t1_0.status,
       t1_0.updated_at
from trip t1_0
where t1_0.trip_id in (?, ?, ?)
order by t1_0.trip_id desc

주소 조건만 있는 경우

시간 조건만 있는 경우, 여행 전체를 조회하기 위해 필요한 데이터는 Post 테이블의 address 컬럼입니다. 따라서 Trip과 Post 테이블만 작업합니다.

select distinct p1_0.trip_id
from post p1_0
where p1_0.address like ? escape '!'
order by p1_0.trip_id desc limit ?

select t1_0.trip_id,
       t1_0.created_at,
       t1_0.image_url,
       t1_0.member_id,
       t1_0.name,
       t1_0.route_image_url,
       t1_0.status,
       t1_0.updated_at
from trip t1_0
where t1_0.trip_id in (?, ?, ?)
order by t1_0.trip_id desc

시간 조건, 주소 조건이 모두 있는 경우

Post.address, Point.recorded_at 컬럼이 필요합니다. 따라서 Trip과, Post, Point 테이블을 작업합니다.

select distinct p1_0.trip_id
from point p1_0 
join post p2_0 on p2_0.point_id = p1_0.point_id
where (year (p1_0.recorded_at)=?
  and month (p1_0.recorded_at)=?
  and p2_0.address like ? escape '!')
order by p1_0.trip_id desc limit ?

select t1_0.trip_id,
       t1_0.created_at,
       t1_0.image_url,
       t1_0.member_id,
       t1_0.name,
       t1_0.route_image_url,
       t1_0.status,
       t1_0.updated_at
from trip t1_0
where t1_0.trip_id in (?, ?, ?)
order by t1_0.trip_id desc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

여행 전체 조회 쿼리 개선 #498

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

여행 전체 조회 쿼리 개선 #498

Uh oh!

Combi153 Oct 16, 2023 Maintainer

쿼리 두 번 날리기(서브쿼리)

조회 조건에 따른 Table Join

조건이 없는 경우

시간 조건만 있는 경우

주소 조건만 있는 경우

시간 조건, 주소 조건이 모두 있는 경우

Replies: 0 comments

Combi153
Oct 16, 2023
Maintainer