Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions docs/data-operate/import/data-source/aws-kinesis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
{
"title": "AWS-Kinesis",
"language": "en",
"description": "Apache Doris continuously imports data from AWS Kinesis Data Streams through Routine Load. It can automatically and continuously consume data from Kinesis streams and import it into Doris tables."
}
---

## Basic Principles

### Core Concept Mapping

| Kinesis | Kafka | Description |
| --- | --- | --- |
| Stream | Topic | Named collection of data streams |
| Shard | Partition | Data shard in a stream; each shard has an independent data sequence |
| Sequence Number | Offset | Unique identifier of a record in a shard |
| GetRecords | Consume | API for reading records from a stream |

### AWS Authentication

For AWS authentication when importing from Kinesis, you can fully refer to the authentication method for importing data from MSK: [Routine Load Manual](./aws-msk.md)

## Parameters

| Parameter | Description | Default | Example |
| --- | --- | --- | --- |
| aws.region | AWS Region | Manually specified | `"us-east-1"` |
| aws.access_key | AWS Access Key ID | Manually specified | `\` |
| aws.secret_key | AWS Secret Access Key | Manually specified | `\` |
| aws.role_arn | Role ARN for cross-account access | Manually specified | `"arn:aws:iam::123456789012:role/MyRole"` |
| kinesis_stream | Kinesis Stream name | Manually specified | `"my-data-stream"` |
| kinesis_shards | Comma-separated list of shard IDs to consume. | All shards by default | `"shardId-000000000001,shardId-000000000002"` |
| kinesis_shards_pos | Starting position for each shard, comma-separated and mapped one-to-one with `kinesis_shards`. | `LATEST` | `TRIM_HORIZON` (earliest), `LATEST` (latest), `sequence number` |
| property.kinesis_default_pos | Default shard start position used when `kinesis_shards_pos` is not specified. | `LATEST` | `TRIM_HORIZON` (earliest), `LATEST` (latest), timestamp `"2026-01-01 00:00:00"` |
| Other `property.*` | Parameters with this prefix are passed through from FE to BE. | `\` | `\` |

## Quick Start

Because Doris reads data from Kinesis through Routine Load, the operation flow is consistent with the [Routine Load Manual](../import-way/routine-load-manual.md).

### Create Import

```SQL
CREATE ROUTINE LOAD [db_name.]job_name ON table_name
[load_properties]
[job_properties]
FROM KINESIS
(
"aws.region" = "us-east-1",
"aws.kinesis_stream" = "<your_stream_name>",
"aws.access_key" = "<your_ak>",
"aws.secret_key" = "<your_sk>"
);
```

### View Import Status

```SQL
SHOW ROUTINE LOAD FOR job_name;
```

Output field description (Kinesis-related fields only)

| Field | Description |
| --- | --- |
| DataSourceType | Data source type: KINESIS |
| DataSourceProperties | Kinesis data source configurations (region, stream, shards) |
| Progress | Consumption progress (Sequence Number for each shard) |
| Lag | Consumption lag (milliseconds from each shard to the latest data) |

### Pause Import Job

```SQL
PAUSE ROUTINE LOAD FOR job_name;
```

### Resume Import Job

```SQL
RESUME ROUTINE LOAD FOR job_name;
```

### Delete Import Job

```SQL
STOP ROUTINE LOAD FOR job_name;
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
{
"title": "AWS-Kinesis",
"language": "zh-CN",
"description": "Apache Doris 以 Routine Load 的方式从 AWS Kinesis Data Streams 持续导入数据。能够自动、持续地从 Kinesis 流中消费数据并导入到 Doris 表中。"
}
---

## 基本原理

### 核心概念映射

| Kinesis | Kafka | 说明 |
| --- | --- | --- |
| Stream | Topic | 数据流的命名集合 |
| Shard | Partition | 流中的数据分片,每个 Shard 有独立的数据序列 |
| Sequence Number | Offset | 记录在 Shard 中的唯一标识符 |
| GetRecords | Consume | 从流中读取数据的 API |

### AWS认证方式

kinesis导入的AWS认证方式可以完全参考从MSK中导入数据的认证方式:[Routine Load 手册](./aws-msk.md)

## 参数

| 参数名 | 说明 | 默认值 | 示例 |
| --- | --- | --- | --- |
| aws.region | AWS Region | 手动填写 | `"us-east-1"` |
| aws.access_key | AWS Access Key ID | 手动填写 | `\` |
| aws.secret_key | AWS Secret Access Key | 手动填写 | `\` |
| aws.role_arn | 跨账号访问凭证 role | 手动填写 | `"arn:aws:iam::123456789012:role/MyRole"` |
| kinesis_stream | Kinesis Stream 名称 | 手动填写 | `"my-data-stream"` |
| kinesis_shards | 指定要消费的 shard ID 列表,逗号分隔。 | 默认选择所有 shards | `"shardId-000000000001,shardId-000000000002"` |
| kinesis_shards_pos | 每个 shard 的起始位置,逗号分隔,与 `kinesis_shards` 一一对应。 | `LATEST` | `TRIM_HORIZON`(最早)、`LATEST`(最新)、`sequence number` |
| property.kinesis_default_pos | 默认 shard 的起始位置,未指定 `kinesis_shards_pos` 时按照该标准读取。 | `LATEST` | `TRIM_HORIZON`(最早)、`LATEST`(最新)、时间戳 `"2026-01-01 00:00:00"` |
| 其余 `property.*` | 该前缀的参数会从 FE 透传到 BE | `\` | `\` |

## 快速上手

由于 Doris 采用 Routine Load 的方式从 Kinesis 读取数据,因此操作方式与 [Routine Load 手册](../import-way/routine-load-manual.md) 一致.

### 创建导入

```
CREATE ROUTINE LOAD [db_name.]job_name ON table_name
[load_properties]
[job_properties]
FROM KINESIS
(
"aws.region" = "us-east-1",
"aws.kinesis_stream" = "<your_stream_name>",
"aws.access_key" = "<your_ak>",
"aws.secret_key" = "<your_sk>"
);
```


### 查看导入状态

```SQL
SHOW ROUTINE LOAD FOR job_name;
```

输出字段说明(仅展示kinesis相关)

| 字段 | 说明 |
| --- | --- |
| DataSourceType | 数据源类型:KINESIS |
| DataSourceProperties | Kinesis 数据源配置(region, stream, shards) |
| Progress | 消费进度(每个 Shard 的 Sequence Number) |
| Lag | 消费延迟(每个 Shard 距离最新数据的毫秒数) |

### 暂停导入作业

```SQL
PAUSE ROUTINE LOAD FOR job_name;
```

### 恢复导入作业

```SQL
RESUME ROUTINE LOAD FOR job_name;
```

### 删除导入作业

```SQL
STOP ROUTINE LOAD FOR job_name;
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
{
"title": "AWS-Kinesis",
"language": "ja",
"description": "Apache Doris は Routine Load を通じて AWS Kinesis Data Streams から継続的にデータをインポートします。Kinesis ストリームからデータを自動かつ継続的に消費し、Doris テーブルへ取り込むことができます。"
}
---

## 基本原理

### コア概念の対応

| Kinesis | Kafka | 説明 |
| --- | --- | --- |
| Stream | Topic | データストリームの名前付きコレクション |
| Shard | Partition | ストリーム内のデータシャード。各 Shard は独立したデータシーケンスを持つ |
| Sequence Number | Offset | Shard 内レコードの一意識別子 |
| GetRecords | Consume | ストリームからレコードを読み取る API |

### AWS 認証方式

Kinesis インポート時の AWS 認証方式は、MSK からのデータインポート時の認証方式をそのまま参照できます: [Routine Load Manual](./aws-msk.md)

## パラメータ

| パラメータ名 | 説明 | デフォルト値 | 例 |
| --- | --- | --- | --- |
| aws.region | AWS Region | 手動指定 | `"us-east-1"` |
| aws.access_key | AWS Access Key ID | 手動指定 | `\` |
| aws.secret_key | AWS Secret Access Key | 手動指定 | `\` |
| aws.role_arn | クロスアカウントアクセス用 Role ARN | 手動指定 | `"arn:aws:iam::123456789012:role/MyRole"` |
| kinesis_stream | Kinesis Stream 名 | 手動指定 | `"my-data-stream"` |
| kinesis_shards | 消費対象の shard ID をカンマ区切りで指定。 | デフォルトですべての shards を選択 | `"shardId-000000000001,shardId-000000000002"` |
| kinesis_shards_pos | 各 shard の開始位置。`kinesis_shards` と 1 対 1 で対応するカンマ区切り。 | `LATEST` | `TRIM_HORIZON`(最古), `LATEST`(最新), `sequence number` |
| property.kinesis_default_pos | `kinesis_shards_pos` 未指定時のデフォルト shard 開始位置。 | `LATEST` | `TRIM_HORIZON`(最古), `LATEST`(最新), タイムスタンプ `"2026-01-01 00:00:00"` |
| その他の `property.*` | このプレフィックスのパラメータは FE から BE へ透過されます。 | `\` | `\` |

## クイックスタート

Doris は Routine Load で Kinesis からデータを読み取るため、基本的な操作は [Routine Load Manual](../import-way/routine-load-manual.md) と同じです。

### インポートの作成

```SQL
CREATE ROUTINE LOAD [db_name.]job_name ON table_name
[load_properties]
[job_properties]
FROM KINESIS
(
"aws.region" = "us-east-1",
"aws.kinesis_stream" = "<your_stream_name>",
"aws.access_key" = "<your_ak>",
"aws.secret_key" = "<your_sk>"
);
```

### インポート状態の確認

```SQL
SHOW ROUTINE LOAD FOR job_name;
```

出力フィールド説明(Kinesis 関連のみ)

| フィールド | 説明 |
| --- | --- |
| DataSourceType | データソース種別: KINESIS |
| DataSourceProperties | Kinesis データソース設定(region, stream, shards) |
| Progress | 消費進捗(各 Shard の Sequence Number) |
| Lag | 消費遅延(各 Shard が最新データに追いつくまでのミリ秒) |

### インポートジョブの一時停止

```SQL
PAUSE ROUTINE LOAD FOR job_name;
```

### インポートジョブの再開

```SQL
RESUME ROUTINE LOAD FOR job_name;
```

### インポートジョブの削除

```SQL
STOP ROUTINE LOAD FOR job_name;
```
1 change: 1 addition & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ const sidebars: SidebarsConfig = {
type: 'category',
label: 'Data Source',
items: [
'data-operate/import/data-source/aws-kinesis',
'data-operate/import/data-source/local-file',
'data-operate/import/data-source/kafka',
'data-operate/import/data-source/flink',
Expand Down