diff --git a/docs/data-operate/import/data-source/aws-kinesis.md b/docs/data-operate/import/data-source/aws-kinesis.md new file mode 100644 index 0000000000000..50a60cc279233 --- /dev/null +++ b/docs/data-operate/import/data-source/aws-kinesis.md @@ -0,0 +1,88 @@ +--- +{ + "title": "AWS-Kinesis", + "language": "en", + "description": "Apache Doris continuously imports data from AWS Kinesis Data Streams through Routine Load. It can automatically and continuously consume data from Kinesis streams and import it into Doris tables." +} +--- + +## Basic Principles + +### Core Concept Mapping + +| Kinesis | Kafka | Description | +| --- | --- | --- | +| Stream | Topic | Named collection of data streams | +| Shard | Partition | Data shard in a stream; each shard has an independent data sequence | +| Sequence Number | Offset | Unique identifier of a record in a shard | +| GetRecords | Consume | API for reading records from a stream | + +### AWS Authentication + +For AWS authentication when importing from Kinesis, you can fully refer to the authentication method for importing data from MSK: [Routine Load Manual](./aws-msk.md) + +## Parameters + +| Parameter | Description | Default | Example | +| --- | --- | --- | --- | +| aws.region | AWS Region | Manually specified | `"us-east-1"` | +| aws.access_key | AWS Access Key ID | Manually specified | `\` | +| aws.secret_key | AWS Secret Access Key | Manually specified | `\` | +| aws.role_arn | Role ARN for cross-account access | Manually specified | `"arn:aws:iam::123456789012:role/MyRole"` | +| kinesis_stream | Kinesis Stream name | Manually specified | `"my-data-stream"` | +| kinesis_shards | Comma-separated list of shard IDs to consume. | All shards by default | `"shardId-000000000001,shardId-000000000002"` | +| kinesis_shards_pos | Starting position for each shard, comma-separated and mapped one-to-one with `kinesis_shards`. | `LATEST` | `TRIM_HORIZON` (earliest), `LATEST` (latest), `sequence number` | +| property.kinesis_default_pos | Default shard start position used when `kinesis_shards_pos` is not specified. | `LATEST` | `TRIM_HORIZON` (earliest), `LATEST` (latest), timestamp `"2026-01-01 00:00:00"` | +| Other `property.*` | Parameters with this prefix are passed through from FE to BE. | `\` | `\` | + +## Quick Start + +Because Doris reads data from Kinesis through Routine Load, the operation flow is consistent with the [Routine Load Manual](../import-way/routine-load-manual.md). + +### Create Import + +```SQL +CREATE ROUTINE LOAD [db_name.]job_name ON table_name +[load_properties] +[job_properties] +FROM KINESIS +( + "aws.region" = "us-east-1", + "aws.kinesis_stream" = "", + "aws.access_key" = "", + "aws.secret_key" = "" +); +``` + +### View Import Status + +```SQL +SHOW ROUTINE LOAD FOR job_name; +``` + +Output field description (Kinesis-related fields only) + +| Field | Description | +| --- | --- | +| DataSourceType | Data source type: KINESIS | +| DataSourceProperties | Kinesis data source configurations (region, stream, shards) | +| Progress | Consumption progress (Sequence Number for each shard) | +| Lag | Consumption lag (milliseconds from each shard to the latest data) | + +### Pause Import Job + +```SQL +PAUSE ROUTINE LOAD FOR job_name; +``` + +### Resume Import Job + +```SQL +RESUME ROUTINE LOAD FOR job_name; +``` + +### Delete Import Job + +```SQL +STOP ROUTINE LOAD FOR job_name; +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/aws-kinesis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/aws-kinesis.md new file mode 100644 index 0000000000000..0d6d7b7c67d5d --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/aws-kinesis.md @@ -0,0 +1,89 @@ +--- +{ + "title": "AWS-Kinesis", + "language": "zh-CN", + "description": "Apache Doris 以 Routine Load 的方式从 AWS Kinesis Data Streams 持续导入数据。能够自动、持续地从 Kinesis 流中消费数据并导入到 Doris 表中。" +} +--- + +## 基本原理 + +### 核心概念映射 + +| Kinesis | Kafka | 说明 | +| --- | --- | --- | +| Stream | Topic | 数据流的命名集合 | +| Shard | Partition | 流中的数据分片,每个 Shard 有独立的数据序列 | +| Sequence Number | Offset | 记录在 Shard 中的唯一标识符 | +| GetRecords | Consume | 从流中读取数据的 API | + +### AWS认证方式 + +kinesis导入的AWS认证方式可以完全参考从MSK中导入数据的认证方式:[Routine Load 手册](./aws-msk.md) + +## 参数 + +| 参数名 | 说明 | 默认值 | 示例 | +| --- | --- | --- | --- | +| aws.region | AWS Region | 手动填写 | `"us-east-1"` | +| aws.access_key | AWS Access Key ID | 手动填写 | `\` | +| aws.secret_key | AWS Secret Access Key | 手动填写 | `\` | +| aws.role_arn | 跨账号访问凭证 role | 手动填写 | `"arn:aws:iam::123456789012:role/MyRole"` | +| kinesis_stream | Kinesis Stream 名称 | 手动填写 | `"my-data-stream"` | +| kinesis_shards | 指定要消费的 shard ID 列表,逗号分隔。 | 默认选择所有 shards | `"shardId-000000000001,shardId-000000000002"` | +| kinesis_shards_pos | 每个 shard 的起始位置,逗号分隔,与 `kinesis_shards` 一一对应。 | `LATEST` | `TRIM_HORIZON`(最早)、`LATEST`(最新)、`sequence number` | +| property.kinesis_default_pos | 默认 shard 的起始位置,未指定 `kinesis_shards_pos` 时按照该标准读取。 | `LATEST` | `TRIM_HORIZON`(最早)、`LATEST`(最新)、时间戳 `"2026-01-01 00:00:00"` | +| 其余 `property.*` | 该前缀的参数会从 FE 透传到 BE | `\` | `\` | + +## 快速上手 + +由于 Doris 采用 Routine Load 的方式从 Kinesis 读取数据,因此操作方式与 [Routine Load 手册](../import-way/routine-load-manual.md) 一致. + +### 创建导入 + +``` +CREATE ROUTINE LOAD [db_name.]job_name ON table_name +[load_properties] +[job_properties] +FROM KINESIS +( + "aws.region" = "us-east-1", + "aws.kinesis_stream" = "", + "aws.access_key" = "", + "aws.secret_key" = "" +); +``` + + +### 查看导入状态 + +```SQL +SHOW ROUTINE LOAD FOR job_name; +``` + +输出字段说明(仅展示kinesis相关) + +| 字段 | 说明 | +| --- | --- | +| DataSourceType | 数据源类型:KINESIS | +| DataSourceProperties | Kinesis 数据源配置(region, stream, shards) | +| Progress | 消费进度(每个 Shard 的 Sequence Number) | +| Lag | 消费延迟(每个 Shard 距离最新数据的毫秒数) | + +### 暂停导入作业 + +```SQL +PAUSE ROUTINE LOAD FOR job_name; +``` + +### 恢复导入作业 + +```SQL +RESUME ROUTINE LOAD FOR job_name; +``` + +### 删除导入作业 + +```SQL +STOP ROUTINE LOAD FOR job_name; +``` \ No newline at end of file diff --git a/ja-source/docusaurus-plugin-content-docs/current/data-operate/import/data-source/aws-kinesis.md b/ja-source/docusaurus-plugin-content-docs/current/data-operate/import/data-source/aws-kinesis.md new file mode 100644 index 0000000000000..67c87bf79cac7 --- /dev/null +++ b/ja-source/docusaurus-plugin-content-docs/current/data-operate/import/data-source/aws-kinesis.md @@ -0,0 +1,88 @@ +--- +{ + "title": "AWS-Kinesis", + "language": "ja", + "description": "Apache Doris は Routine Load を通じて AWS Kinesis Data Streams から継続的にデータをインポートします。Kinesis ストリームからデータを自動かつ継続的に消費し、Doris テーブルへ取り込むことができます。" +} +--- + +## 基本原理 + +### コア概念の対応 + +| Kinesis | Kafka | 説明 | +| --- | --- | --- | +| Stream | Topic | データストリームの名前付きコレクション | +| Shard | Partition | ストリーム内のデータシャード。各 Shard は独立したデータシーケンスを持つ | +| Sequence Number | Offset | Shard 内レコードの一意識別子 | +| GetRecords | Consume | ストリームからレコードを読み取る API | + +### AWS 認証方式 + +Kinesis インポート時の AWS 認証方式は、MSK からのデータインポート時の認証方式をそのまま参照できます: [Routine Load Manual](./aws-msk.md) + +## パラメータ + +| パラメータ名 | 説明 | デフォルト値 | 例 | +| --- | --- | --- | --- | +| aws.region | AWS Region | 手動指定 | `"us-east-1"` | +| aws.access_key | AWS Access Key ID | 手動指定 | `\` | +| aws.secret_key | AWS Secret Access Key | 手動指定 | `\` | +| aws.role_arn | クロスアカウントアクセス用 Role ARN | 手動指定 | `"arn:aws:iam::123456789012:role/MyRole"` | +| kinesis_stream | Kinesis Stream 名 | 手動指定 | `"my-data-stream"` | +| kinesis_shards | 消費対象の shard ID をカンマ区切りで指定。 | デフォルトですべての shards を選択 | `"shardId-000000000001,shardId-000000000002"` | +| kinesis_shards_pos | 各 shard の開始位置。`kinesis_shards` と 1 対 1 で対応するカンマ区切り。 | `LATEST` | `TRIM_HORIZON`(最古), `LATEST`(最新), `sequence number` | +| property.kinesis_default_pos | `kinesis_shards_pos` 未指定時のデフォルト shard 開始位置。 | `LATEST` | `TRIM_HORIZON`(最古), `LATEST`(最新), タイムスタンプ `"2026-01-01 00:00:00"` | +| その他の `property.*` | このプレフィックスのパラメータは FE から BE へ透過されます。 | `\` | `\` | + +## クイックスタート + +Doris は Routine Load で Kinesis からデータを読み取るため、基本的な操作は [Routine Load Manual](../import-way/routine-load-manual.md) と同じです。 + +### インポートの作成 + +```SQL +CREATE ROUTINE LOAD [db_name.]job_name ON table_name +[load_properties] +[job_properties] +FROM KINESIS +( + "aws.region" = "us-east-1", + "aws.kinesis_stream" = "", + "aws.access_key" = "", + "aws.secret_key" = "" +); +``` + +### インポート状態の確認 + +```SQL +SHOW ROUTINE LOAD FOR job_name; +``` + +出力フィールド説明(Kinesis 関連のみ) + +| フィールド | 説明 | +| --- | --- | +| DataSourceType | データソース種別: KINESIS | +| DataSourceProperties | Kinesis データソース設定(region, stream, shards) | +| Progress | 消費進捗(各 Shard の Sequence Number) | +| Lag | 消費遅延(各 Shard が最新データに追いつくまでのミリ秒) | + +### インポートジョブの一時停止 + +```SQL +PAUSE ROUTINE LOAD FOR job_name; +``` + +### インポートジョブの再開 + +```SQL +RESUME ROUTINE LOAD FOR job_name; +``` + +### インポートジョブの削除 + +```SQL +STOP ROUTINE LOAD FOR job_name; +``` diff --git a/sidebars.ts b/sidebars.ts index 5f05731a615d4..69d4cbd05f526 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -160,6 +160,7 @@ const sidebars: SidebarsConfig = { type: 'category', label: 'Data Source', items: [ + 'data-operate/import/data-source/aws-kinesis', 'data-operate/import/data-source/local-file', 'data-operate/import/data-source/kafka', 'data-operate/import/data-source/flink',