'elasticsearch' 카테고리의 글 목록

[elasticsearch] this cluster currently has [1000]/[1000] maximum normal shards open 오류

이 오류는 하나의 노드 당 최대 샤드 개수는 1000개인데 1000개 모두 열려있어서 오류가 발생하는 것이다. 공식문서 내용 cluster.max_shards_per_node 클러스터의 primary와 replica 샤드의 총 개수를 제한한다. elasticsearch는 다음과 같이 제한수를 산정한다. cluster.max_shards_per_node * number of non-frozen data nodes기본값은 1000개이다. elasticsearch는 제한을 초과하는 모든 요청을 거절한다. 해결 방법 해결방법은 최대 샤드 개수를 늘리면 된다. "cluster.max_shards_per_node" 값이 기본 1000으로 설정되어 있었다. PUT _cluster/settings { "persiste..

2023. 6. 5. 06:42

elasticsearch

elasticsearch에서 채팅방-메시지 구조에 따른 몇 가지 테스트

Elasticsearch로 채팅방 구조에 대한 데이터 설계를 어떻게 해야 할지 고민하면서 정리한 자료이다. 적절한 데이터 구조를 찾기 위해 실제 테스트를 위한 개발은 spring으로 진행하였다. 개요 일반적으로 채팅방에는 여러 개의 메시지로 구성되어 있다. 이런 메시지들이 계속 누적이 되어 1억~10억 단위로 넘어가면 문제가 생기기 시작한다. 이런 문제를 해결하기 위한 다양한 기술들이 있지만 여기서는 데이터 구조에 대한 문제만 다루도록 한다. elasticsearch는 기본적으로 인덱스 간 조인을 지원하지 않는다. 이는 성능문제와 직결된 것이라 오히려 중복데이터를 유지하면서 검색하는게 성능적으로 더 유리하다고 판단하기 때문이다. 그래서 elasticsearch의 특성에 맞게 아래의 요구사항을 검토하게 되..

2023. 1. 27. 08:07

elasticsearch

[elasticsearch] nested object 추가,수정 시 docs 수 변화

nested object`로 구성되어 있는 index는 nested 유형에 대해 새로운 document를 생성한다. 예를 들어 nested object를 가지고 있는 문서(부서)에 하나의 값(직원)을 추가하면 1개의 document가 생성될 것 같지만 실제로는 2개의 document가 생성이 된다. 예제를 통해 확인해보자. [문서 구조] department (document) employees (nested) 부서(department) 인덱스 생성 PUT department { "mappings": { "properties": { "name": { "type": "text" }, "employees": { "type": "nested" } } } } mapping 정보 확인 GET department/_..

2022. 12. 31. 14:52

elasticsearch

elasticsearch cache 조회

통계 정보 # node 상태 조회 GET _nodes/stats # fielddata 조회 GET _nodes/stats/indices/fielddata # query cache 조회 GET _nodes/stats/indices/query_cache # request cache 조회 GET _nodes/stats/indices/request_cache # fielddata, query cache, request cache 한번에 조회 GET _nodes/stats/indices/request_cache,query_cache,fielddata?human&pretty 특정 인덱스 통계 # 특정 인덱스의 fielddata, query cache, request cache 조회 GET {index-name}/_..

2022. 12. 23. 07:04

elasticsearch

nori tokenizer에서 불용어 제거

nori에서 기본적으로 검색할 때 "조사", "어미", "감탄사" 등 검색에 불필요한 단어도 모두 형태소 분석이 된다. 형태소 분석이 불필요한 불용어를 설정해보자. "사랑하다"를 nori로 형태소 분석을 해보자. POST _analyze { "tokenizer": "nori_tokenizer", "text": "사랑하다" } 결과 { "tokens" : [ { "token" : "사랑", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 0 }, { "token" : "하", "start_offset" : 2, "end_offset" : 3, "type" : "word", "position" : 1 }, { "token" : "다",..

2022. 12. 23. 07:01

elasticsearch

elasticdump를 이용한 index export/import

우선 elasticdump를 설치하자. 설치 npm install elasticdump -g export elasticdump \ --input=http://localhost:9200/test_index \ --output /tmp/test_index.json --type=data import elasticdump \ --input=test_index.json \ --output=http://localhost:9200/test_index \ --type=data

2022. 12. 23. 07:00

elasticsearch

elasticsearch join field 사용

인덱스 정의 PUT department { "mappings": { "properties": { "join_field": { "type": "join", "relations": { "department": "employee" } } } } } 데이터 입력 PUT department/_doc/1 { "name": "Development", "join_field": "department" } PUT department/_doc/2 { "name": "Marketing", "join_field": "department" } PUT department/_doc/3?routing=1 { "name": "Bo Anderson", "age": 28, "gender": "M", "join_field": { "name"..

2022. 12. 23. 07:00

elasticsearch

elasticsearch nested 타입에 _update 사용법

post 인덱스 생성 PUT post { "mappings": { "properties": { "id": { "type": "keyword" }, "userId": { "type": "keyword" }, "userName": { "type": "keyword" }, "comments": { "type": "nested", "properties": { "commentId": { "type": "keyword" }, "text": { "type": "text" } } } } } } id가 1인 데이터 입력 POST post/_doc/1 { "id": "1", "userId": "_", "userName": "_", "comments": [] } comments에 데이터 입력 POST post/_update..

2022. 12. 23. 06:57

elasticsearch

elasticsearch nested 타입에 _update_by_query 사용법

nested 타입을 _update_by_query를 사용하여 add, update, remove하는 방법 우선 인덱스를 생성한다. (post 인덱스에 comment가 nested로 포함되어 있음) 인덱스 생성 PUT posts { "mappings": { "properties": { "postId": { "type": "keyword" }, "title": { "type": "keyword" }, "comments": { "type": "nested", "properties": { "commentId": { "type": "keyword" }, "text": { "type": "text" } } } } } } 데이터 입력 post1 데이터를 입력하자. POST posts/_doc/1 { "postId":..

2022. 12. 22. 20:39

elasticsearch

elasticsearch nori의 decompound_mode

decompound_mode decompound_mode는 토크나이저가 복합명사를 처리하는 방식을 결정한다. 복합명사가 있을 경우 단어를 어떻게 쪼갤지 결정한다. 파라미터값 설명 예제 none 복합명사로 분리하지 않는다 월미도 영종도 discard 복합명사로 분리하고 원본 데이터는 삭제한다 잠실역=>[잠실,역] mixed 복합명사로 분리하고 원본 데이터는 유지한다 잠실역=>[잠실,역,잠실역] nori_analyzer 인덱스 생성 (decompound_mode = mixed) PUT http://localhost:9200/nori_analyzer content-Type: application/json { "settings": { "index": { "analysis": { "tokenizer": { "no..

2022. 12. 22. 20:38

elasticsearch

elasticsearch nested field type

object의 동작방식 elasticsearch는 내부 객체의 개념이 없다. 그래서 객체 구조를 단순 키/값 형태로 표시한다. PUT index-object/\_doc/1 { "group" : "fans", "user" : \[ { "first" : "John", "last" : "Smith" }, { "first" : "Alice", "last" : "White" } \] } 위의 user 필드는 object 필드로 추가된다. 그리고 이 문서는 내부적으로 아래와 같이 변환될 것이다. { "group" : "fans", "user.first" : \[ "alice", "john" \], "user.last" : \[ "smith", "white" \] } user.first와 user.last 필드는 m..

2022. 12. 22. 20:38

elasticsearch

elasticsearch 테스트케이스에 따른 시간 측정

ElasticsearchRepository를 사용하여 elasticsearch에 문서를 저장하는 속도를 측정해보았다. 테스트 케이스 1. save와 saveAll의 저장 속도 비교 save로 저장하는 경우 단건을 하나씩 저장 saveAll로 저장하는 경우 여러건을 bulk로 저장 2. 필드 유형 지정하지 않는 경우 저장 속도/크기 비교 keyword, text, long 등 @Field를 지정하는 경우 필드유형을 지정하지 않는 경우 (text, keyword 둘다 생성) 3. nested 타입을 object 타입으로 변경 nested 타입으로 저장되면 내부적으로 doc문서가 더 생긴다. (nested의 건수별로) 4. shard 개수에 따른 속도 비교 (shard=10) shard개수를 1개로 지정하는 ..

2022. 12. 22. 20:37

elasticsearch

elasticsearch에서 기본인증을 사용하는 경우 rest 접속 방법

기본적으로 elasticsearch는 rest로 접근이 가능하다. 하지만 elasticsearch에 인증을 설정하게 되면 인증정보 없이는 접근이 불가능하다. 인증 설정방법은 elasticsearch.yml파일에 아래 내용을 추가하고 재시작하면 된다. xpack.security: enabled: true transport: ssl: enabled: true 인증이 설정된 상태에서 http://localhost:9200/_search로 접속하면 아래 오류가 발생한다. { "error": { "root_cause": [ { "type": "security_exception", "reason": "missing authentication credentials for REST request [/_search]",..

2022. 12. 22. 20:36

elasticsearch

[번역] 엘라스틱 서치 인덱싱 속도 향상을 위한 튜닝

이 글은 아래 원문을 번역한 자료입니다. 번역 내용이 좀 부족할 수 있지만 이해해 주세요~ 원문: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html bulk requests를 사용해라 Bulk requests는 단일 문서 인덱싱 요청보다 훨씬 더 좋은 성능을 보인다. bulk request의 최적의 크기를 알기 위해서는 단일 샤드를 가진 단일 노드로 벤치마킹을 해봐야 한다. 우선 한번에 100개의 문서, 그리고 나서 200개, 400개 순으로 인덱싱해봐야 한다. 매번 벤치마킹을 실행할 때 bulk request 문서를 배수로 늘려가면서 해본다. 인덱싱 속도가 안정적으로 유지될 때 bulk ..

2022. 12. 22. 20:35

elasticsearch

elasticsearch nested and 검색

elasticsearch에서 nested 유형에 대해 and조건으로 검색하는 방법이다. metadata의 nested 유형에 key, value를 값으로 가지는 index를 만든다. PUT test_nested { "mappings": { "properties": { "metadata": { "type": "nested", "properties": { "key": { "type": "keyword" }, "value": { "type": "keyword" } } } } } } 아래의 데이터를 입력하자 key value 과일 사과 과일 바나나 식사 비빔밥 위의 조건으로 검색을 해보자. key = '과일'이고 value = '바나나' 인것은 검색이 되어야 한다. GET test_nested/_search {..

2022. 12. 22. 20:35

[elasticsearch] this cluster currently has [1000]/[1000] maximum normal shards open 오류

elasticsearch에서 채팅방-메시지 구조에 따른 몇 가지 테스트

[elasticsearch] nested object 추가,수정 시 docs 수 변화

elasticsearch cache 조회

nori tokenizer에서 불용어 제거

elasticdump를 이용한 index export/import

elasticsearch join field 사용

elasticsearch nested 타입에 _update 사용법

elasticsearch nested 타입에 _update_by_query 사용법

elasticsearch nori의 decompound_mode

elasticsearch nested field type

elasticsearch 테스트케이스에 따른 시간 측정

elasticsearch에서 기본인증을 사용하는 경우 rest 접속 방법

[번역] 엘라스틱 서치 인덱싱 속도 향상을 위한 튜닝

elasticsearch nested and 검색

티스토리툴바