Skip to content

Commit f16630a

Browse files
Add concepts page to getting started (#10136)
* Add concepts page to getting started Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Signed-off-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
1 parent 6468b72 commit f16630a

File tree

4 files changed

+114
-30
lines changed

4 files changed

+114
-30
lines changed

_dashboards/visualize/viz-index.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ layout: default
33
title: Building data visualizations
44
nav_order: 40
55
has_children: true
6+
has_toc: false
67
---
78

89
# Building data visualizations
@@ -120,3 +121,13 @@ VisBuilder is a drag-and-drop data visualization tool in Dashboards. It gives yo
120121
[Vega](https://vega.github.io/vega/) and [Vega-Lite](https://vega.github.io/vega-lite/) are open-source, declarative language visualization grammars for creating, sharing, and saving interactive data visualizations. Vega visualizations give you the flexibility to visualize multidimensional data using a layered approach in order to build and manipulate visualizations in a structured manner. Vega can be used to create customized visualizations using any Dashboards visualization type.
121122

122123
<img src="{{site.url}}{{site.baseurl}}/images/dashboards/vega-1.png" width="600" height="600" alt="Example Vega visualization with JSON specification in OpenSearch Dashboards">
124+
125+
## Next steps
126+
127+
To try building various visualizations, see the following articles:
128+
129+
- [Area charts]({{site.url}}{{site.baseurl}}/dashboards/visualize/area/)
130+
- [Coordinate and region maps]({{site.url}}{{site.baseurl}}/dashboards/visualize/geojson-regionmaps/)
131+
- [Time-series visualizations]({{site.url}}{{site.baseurl}}/dashboards/visualize/tsvb/)
132+
- [Vega visualizations]({{site.url}}{{site.baseurl}}/dashboards/visualize/vega/)
133+
- [Drag-and-drop visualizations]({{site.url}}{{site.baseurl}}/dashboards/visualize/visbuilder/)

_getting-started/concepts.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
layout: default
3+
title: Concepts
4+
nav_order: 70
5+
---
6+
7+
# Concepts
8+
9+
This page defines key terms and concepts related to OpenSearch.
10+
11+
## Basic concepts
12+
13+
- [***Document***]({{site.url}}{{site.baseurl}}/getting-started/intro/#document): The basic unit of information in OpenSearch, stored in JSON format.
14+
- [***Index***]({{site.url}}{{site.baseurl}}/getting-started/intro/#index): A collection of related documents.
15+
- [***JSON (JavaScript object notation)***](https://www.json.org/): A text format used to store data in OpenSearch, representing information as key-value pairs.
16+
- [***Mapping***]({{site.url}}{{site.baseurl}}/field-types/): The schema definition for an index that specifies how documents and their fields should be stored and indexed.
17+
18+
## Cluster architecture
19+
20+
- [***Node***]({{site.url}}{{site.baseurl}}/getting-started/intro/#clusters-and-nodes): A single server that is part of an OpenSearch cluster.
21+
- [***Cluster***]({{site.url}}{{site.baseurl}}/getting-started/intro/#clusters-and-nodes): A collection of OpenSearch nodes working together.
22+
- [***Cluster manager***]({{site.url}}{{site.baseurl}}/getting-started/intro/#clusters-and-nodes): The node responsible for managing cluster-wide operations.
23+
- [***Shard***]({{site.url}}{{site.baseurl}}/getting-started/intro/#shards): A subset of an index's data; indexes are split into shards for distribution across nodes.
24+
- [***Primary shard***]({{site.url}}{{site.baseurl}}/getting-started/intro/#primary-and-replica-shards): The original shard containing index data.
25+
- [***Replica shard***]({{site.url}}{{site.baseurl}}/getting-started/intro/#primary-and-replica-shards): A copy of a primary shard for redundancy and search performance.
26+
27+
28+
## Data structures and storage
29+
30+
- [***Doc values***]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/): An on-disk data structure for efficient sorting and aggregating of field values.
31+
- [***Inverted index***]({{site.url}}{{site.baseurl}}/getting-started/intro/#inverted-index): A data structure that maps words to the documents containing them.
32+
- ***Lucene***: The underlying search library that OpenSearch uses to index and search data.
33+
- ***Segment***: An immutable unit of data storage within a shard.
34+
35+
## Data operations
36+
37+
- ***Ingestion***: The process of adding data to OpenSearch.
38+
- [***Indexing***]({{site.url}}{{site.baseurl}}/api-reference/document-apis/index-document/): The process of storing and organizing data in OpenSearch to make it searchable.
39+
- [***Bulk indexing***]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/): The process of indexing multiple documents in a single request.
40+
41+
## Text analysis
42+
43+
- [***Text analysis***]({{site.url}}{{site.baseurl}}/analyzers/): A process of splitting the unstructured free text content of a document into a sequence of terms, which are then stored in an inverted index.
44+
- [***Analyzer***]({{site.url}}{{site.baseurl}}/analyzers/#analyzers): A component that processes text to prepare it for search. Analyzers convert text into terms that are stored in the inverted index.
45+
- [***Tokenizer***]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/index/): The component of an analyzer that splits text into individual tokens (usually words) and records metadata about their positions.
46+
- [***Token filter***]({{site.url}}{{site.baseurl}}/analyzers/token-filters/index/): The final component of an analyzer, which modifies, adds, or removes tokens after tokenization. Examples include lowercase conversion, stopword removal, and synonym addition.
47+
- [***Token***]({{site.url}}{{site.baseurl}}/analyzers/): A unit of text created by a tokenizer during text analysis. Tokens can be modified by token filters and contain metadata used in the text analysis process.
48+
- [***Term***]({{site.url}}{{site.baseurl}}/analyzers/): A data value that is directly stored in the inverted index and used for matching during search operations. Terms have minimal associated metadata.
49+
- [***Character filter***]({{site.url}}{{site.baseurl}}/analyzers/character-filters/index/): The first component of an analyzer that processes raw text by adding, removing, or modifying characters before tokenization.
50+
- [***Normalizer***]({{site.url}}{{site.baseurl}}/analyzers/normalizers/): A special type of analyzer that processes text without tokenization. It can only perform character-level operations and cannot modify whole tokens.
51+
- [***Stemming***]({{site.url}}{{site.baseurl}}/analyzers/stemming/): The process of reducing words to their root or base form, known as the _stem_.
52+
53+
## Search and query concepts
54+
55+
- ***Query***: A request to OpenSearch that describes what you're searching for in your data.
56+
- ***Query clause***: A single condition within a query that specifies criteria for matching documents.
57+
- [***Filter***]({{site.url}}{{site.baseurl}}/query-dsl/query-filter-context/#filter-context): A query component that finds exact matches without scoring.
58+
- [***Filter context***]({{site.url}}{{site.baseurl}}/query-dsl/query-filter-context/): A query clause in a filter context asks the question _"Does the document match the query clause?"_
59+
- [***Query context***]({{site.url}}{{site.baseurl}}/query-dsl/query-filter-context/): A query clause in a query context asks the question _"How well does the document match the query clause?"_
60+
- [***Full-text search***]({{site.url}}{{site.baseurl}}/query-dsl/term-vs-full-text/): Search that analyzes and matches text fields, considering variations in word forms.
61+
- [***Keyword search***]({{site.url}}{{site.baseurl}}/query-dsl/term-vs-full-text/): Search that requires exact text matches.
62+
- [***Query domain-specific language (DSL)***]({{site.url}}{{site.baseurl}}/query-dsl/): OpenSearch's primary query language for creating complex, customizable searches.
63+
- [***Query string query language***]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/): A simplified query syntax that can be used in URL parameters.
64+
- [***Dashboards Query Language (DQL)***]({{site.url}}{{site.baseurl}}/dashboards/dql/): A simple text-based query language used specifically for filtering data in OpenSearch Dashboards.
65+
- [***Piped Processing Language (PPL)***]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/): A query language that uses pipe syntax (`|`) to chain commands for data processing and analysis. Primarily used for observability use cases in OpenSearch.
66+
- [***Relevance score***]({{site.url}}{{site.baseurl}}/getting-started/intro/#relevance): A number indicating how well a document matches a query.
67+
- [***Aggregation***]({{site.url}}{{site.baseurl}}/aggregations/): A way to analyze and summarize data based on a search query.
68+
69+
## Vector search concepts
70+
71+
See [Vector search concepts]({{site.url}}{{site.baseurl}}/vector-search/getting-started/concepts/).
72+
73+
## Advanced concepts
74+
75+
The following section describes more advanced OpenSearch concepts.
76+
77+
### Update lifecycle
78+
79+
The lifecycle of an update operation consists of the following steps:
80+
81+
1. An update is received by a primary shard and is written to the shard's transaction log ([translog](#translog)). The translog is flushed to disk (followed by an fsync) before the update is acknowledged. This guarantees durability.
82+
1. The update is also passed to the Lucene index writer, which adds it to an in-memory buffer.
83+
1. On a [refresh operation](#refresh), the Lucene index writer flushes the in-memory buffers to disk (with each buffer becoming a new Lucene segment), and a new index reader is opened over the resulting segment files. The updates are now visible for search.
84+
1. On a [flush operation](#flush), the shard fsyncs the Lucene segments. Because the segment files are a durable representation of the updates, the translog is no longer needed to provide durability, so the updates can be purged from the translog.
85+
86+
### Translog
87+
88+
An indexing or bulk call responds when the documents have been written to the translog and the translog is flushed to disk, so the updates are durable. The updates will not be visible to search requests until after a [refresh operation](#refresh).
89+
90+
### Refresh
91+
92+
Periodically, OpenSearch performs a _refresh_ operation, which writes the documents from the in-memory Lucene index to files. These files are not guaranteed to be durable because an `fsync` is not performed. A refresh makes documents available for search.
93+
94+
### Flush
95+
96+
A _flush_ operation persists the files to disk using `fsync`, ensuring durability. Flushing ensures that the data stored only in the translog is recorded in the Lucene index. OpenSearch performs a flush as needed to ensure that the translog does not grow too large.
97+
98+
### Merge
99+
100+
In OpenSearch, a shard is a Lucene index, which consists of _segments_ (or segment files). Segments store the indexed data and are immutable. Periodically, smaller segments are merged into larger ones. Merging reduces the overall number of segments on each shard, frees up disk space, and improves search performance. Eventually, segments reach a maximum size specified in the merge policy and are no longer merged into larger segments. The merge policy also specifies how often merges are performed.

_getting-started/intro.md

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -127,35 +127,6 @@ Individual words in a search query are called search _terms_. Each search term i
127127

128128
OpenSearch uses the BM25 ranking algorithm to calculate document relevance scores and then returns the results sorted by relevance. To learn more, see [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25).
129129

130-
## Advanced concepts
131-
132-
The following section describes more advanced OpenSearch concepts.
133-
134-
### Update lifecycle
135-
136-
The lifecycle of an update operation consists of the following steps:
137-
138-
1. An update is received by a primary shard and is written to the shard's transaction log ([translog](#translog)). The translog is flushed to disk (followed by an fsync) before the update is acknowledged. This guarantees durability.
139-
1. The update is also passed to the Lucene index writer, which adds it to an in-memory buffer.
140-
1. On a [refresh operation](#refresh), the Lucene index writer flushes the in-memory buffers to disk (with each buffer becoming a new Lucene segment), and a new index reader is opened over the resulting segment files. The updates are now visible for search.
141-
1. On a [flush operation](#flush), the shard fsyncs the Lucene segments. Because the segment files are a durable representation of the updates, the translog is no longer needed to provide durability, so the updates can be purged from the translog.
142-
143-
### Translog
144-
145-
An indexing or bulk call responds when the documents have been written to the translog and the translog is flushed to disk, so the updates are durable. The updates will not be visible to search requests until after a [refresh operation](#refresh).
146-
147-
### Refresh
148-
149-
Periodically, OpenSearch performs a _refresh_ operation, which writes the documents from the in-memory Lucene index to files. These files are not guaranteed to be durable because an `fsync` is not performed. A refresh makes documents available for search.
150-
151-
### Flush
152-
153-
A _flush_ operation persists the files to disk using `fsync`, ensuring durability. Flushing ensures that the data stored only in the translog is recorded in the Lucene index. OpenSearch performs a flush as needed to ensure that the translog does not grow too large.
154-
155-
### Merge
156-
157-
In OpenSearch, a shard is a Lucene index, which consists of _segments_ (or segment files). Segments store the indexed data and are immutable. Periodically, smaller segments are merged into larger ones. Merging reduces the overall number of segments on each shard, frees up disk space, and improves search performance. Eventually, segments reach a maximum size specified in the merge policy and are no longer merged into larger segments. The merge policy also specifies how often merges are performed.
158-
159130
## Next steps
160131

161132
- Learn how to install OpenSearch within minutes in [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/).

_search-plugins/index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,9 @@ OpenSearch offers several ways to improve search performance:
7272

7373
## Search relevance
7474

75-
OpenSearch provides the following search relevance features:
75+
*Search relevance* is a measure of how well a document matches a search query. When you run a search query, OpenSearch compares the words in your query to the words in each document and assigns a relevance score based on factors such as how frequently the words appear and how closely they match. For more information, see [Relevance]({{site.url}}{{site.baseurl}}/getting-started/intro/#relevance).
76+
77+
To help you fine-tune and improve search relevance, OpenSearch provides several specialized features:
7678

7779
- [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/): A suite of tools that support search quality improvements through experimentation.
7880

0 commit comments

Comments
 (0)