Skip to content

Commit 1357ec2

Browse files
committed
add:pg_ai_query
1 parent 2f7b8e4 commit 1357ec2

File tree

4 files changed

+640
-0
lines changed

4 files changed

+640
-0
lines changed

CN/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
**** xref:master/ecosystem_components/pg_cron.adoc[pg_cron]
3232
**** xref:master/ecosystem_components/pgsql_http.adoc[pgsql-http]
3333
**** xref:master/ecosystem_components/plpgsql_check.adoc[plpgsql_check]
34+
**** xref:master/ecosystem_components/pg_ai_query.adoc[pg_ai_query]
3435
**** xref:master/ecosystem_components/pgroonga.adoc[pgroonga]
3536
**** xref:master/ecosystem_components/pgaudit.adoc[pgaudit]
3637
**** xref:master/ecosystem_components/pgrouting.adoc[pgrouting]
Lines changed: 319 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,319 @@
1+
:sectnums:
2+
:sectnumlevels: 5
3+
4+
= pg_ai_query
5+
6+
== 概述
7+
8+
pg_ai_query 是一个用于 IvorySQL/PostgreSQL 的 AI 驱动自然语言转 SQL 扩展。它利用大语言模型(LLM)将用户的自然语言描述直接转换为可执行的 SQL 查询语句,支持 OpenAI、Anthropic Claude 和 Google Gemini 等多种 AI 模型。
9+
10+
项目地址:<https://github.com/benodiwal/pg_ai_query/tree/main>
11+
12+
开源协议:Apache-2.0
13+
14+
主要特性:
15+
16+
* **自然语言转 SQL**:将语言描述转换为有效的 PostgreSQL 查询
17+
* **多模型支持**:支持 gpt-4o-mini、gpt-4o、gpt-5、claude-3-haiku-20240307、claude-sonnet-4-5-20250929、claude-4.5-opus 等大模型
18+
* **安全保护**:阻止访问 `information_schema` 和 `pg_catalog` 系统表
19+
* **作用域限制**:仅对用户表进行操作
20+
* **可配置限制**:内置行数限制强制执行
21+
* **API 密钥安全**:安全处理 API 凭证
22+
23+
== 快速开始
24+
25+
=== 安装
26+
27+
*依赖要求*
28+
29+
* PostgreSQL 14+ with development headers
30+
* CMake 3.16+
31+
* C++20 compatible compiler
32+
* API key from OpenAI, Anthropic, or Google (Gemini)
33+
34+
*安装依赖*
35+
36+
[source,bash]
37+
----
38+
sudo apt-get install libcurl4-openssl-dev
39+
----
40+
41+
*编译安装 IvorySQL*
42+
43+
如需从源码编译 IvorySQL,可参考以下配置:
44+
45+
[source,bash]
46+
----
47+
./configure \
48+
--prefix=$PWD/inst \
49+
--enable-cassert \
50+
--enable-debug \
51+
--enable-tap-tests \
52+
--enable-rpath \
53+
--enable-nls \
54+
--enable-injection-points \
55+
--with-tcl \
56+
--with-python \
57+
--with-gssapi \
58+
--with-pam \
59+
--with-ldap \
60+
--with-openssl \
61+
--with-libedit-preferred \
62+
--with-uuid=e2fs \
63+
--with-ossp-uuid \
64+
--with-libxml \
65+
--with-libxslt \
66+
--with-perl \
67+
--with-icu \
68+
--with-libnuma
69+
----
70+
71+
*编译安装 pg_ai_query*
72+
73+
[source,bash]
74+
----
75+
git clone --recurse-submodules https://github.com/benodiwal/pg_ai_query.git
76+
cd pg_ai_query
77+
mkdir build && cd build
78+
export PATH="$HOME/works/repo/ivorysql/IvorySQL/inst/bin:$PATH"
79+
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/works/repo/ivorysql/IvorySQL/inst
80+
make && sudo make install
81+
----
82+
83+
*创建扩展*
84+
85+
[source,sql]
86+
----
87+
CREATE EXTENSION pg_ai_query;
88+
----
89+
90+
=== 配置
91+
92+
在 home 目录下创建 `~/.pg_ai.config` 配置文件:
93+
94+
[source,ini]
95+
----
96+
[general]
97+
log_level = "INFO"
98+
enable_logging = false
99+
100+
[query]
101+
enforce_limit = true
102+
default_limit = 1000
103+
104+
[response]
105+
show_explanation = true
106+
show_warnings = true
107+
show_suggested_visualization = false
108+
use_formatted_response = false
109+
110+
[anthropic]
111+
# Your Anthropic API key (if using Claude)
112+
api_key = "******"
113+
114+
# Default model to use (options: claude-sonnet-4-5-20250929)
115+
default_model = "claude-sonnet-4-5-20250929"
116+
117+
# Custom API endpoint (optional) - for Anthropic-compatible APIs
118+
api_endpoint = "https://open.bigmodel.cn/api/anthropic"
119+
120+
[prompts]
121+
# Use file paths to read custom prompts
122+
system_prompt = /home/highgo/.pg_ai.prompts
123+
explain_system_prompt = /home/highgo/.pg_ai.explain.prompts
124+
----
125+
126+
更多示例请参考:<https://github.com/benodiwal/pg_ai_query/blob/main/docs/src/examples.md>
127+
128+
== 使用示例
129+
130+
=== 基本用法
131+
132+
[source,sql]
133+
----
134+
SELECT generate_query('找出所有的用户');
135+
----
136+
137+
输出示例:
138+
139+
----
140+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260209135507cc16362d5d324ccd
141+
142+
generate_query
143+
--------------------------------------------------------
144+
SELECT * FROM public.users LIMIT 1000;
145+
+
146+
-- Explanation:
147+
-- Retrieves all columns and rows from the users table.
148+
+
149+
-- Warning: INFO: Applied LIMIT 1000 to prevent large result sets. Remove LIMIT if you need all data.
150+
+
151+
-- Note: Row limit was automatically applied to this query for safety
152+
(1 row)
153+
----
154+
155+
执行查询:
156+
157+
[source,sql]
158+
----
159+
SELECT * FROM public.users LIMIT 1000;
160+
----
161+
162+
输出:
163+
164+
----
165+
id | name | email | age | created_at | city
166+
----+---------------+-------------------+-----+----------------------------+---------------
167+
1 | Alice Johnson | [email protected] | 28 | 2026-02-04 15:47:55.208111 | New York
168+
2 | Bob Smith | [email protected] | 35 | 2026-02-04 15:47:55.208111 | San Francisco
169+
3 | Carol Davis | [email protected] | 31 | 2026-02-04 15:47:55.208111 | Chicago
170+
4 | David Wilson | [email protected] | 27 | 2026-02-04 15:47:55.208111 | Seattle
171+
5 | Eva Brown | [email protected] | 33 | 2026-02-04 15:47:55.208111 | Boston
172+
(5 rows)
173+
----
174+
175+
=== generate_query 示例
176+
177+
*生成测试数据*
178+
179+
[source,sql]
180+
----
181+
SELECT generate_query('生成100条user数据,插入到users');
182+
----
183+
184+
输出:
185+
186+
----
187+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_2026021114092101601c5650864a2d
188+
189+
generate_query
190+
--------------------------------------------------------------------------------------------------------
191+
INSERT INTO public.users (name, email, age, city, status)
192+
SELECT 'User_' || generate_series AS name,
193+
'user' || generate_series || '@example.com' AS email,
194+
(18 + (generate_series % 50)) AS age,
195+
(ARRAY['Beijing','Shanghai','Guangzhou','Shenzhen','Hangzhou'])[1 + (generate_series % 5)] AS city,
196+
'active' AS status
197+
FROM generate_series(1, 100);
198+
+
199+
-- Explanation:
200+
-- 生成100条模拟用户数据并插入到users表中。数据包括自动生成的姓名、唯一邮箱、随机年龄(18-67岁)、随机城市和默认状态。
201+
+
202+
-- Warnings:
203+
-- 1. INFO: 依赖users表的id列有DEFAULT自增设置,未手动插入id。
204+
-- 2. INFO: 使用generate_series函数生成序列数据,这是PostgreSQL/IvorySQL的特性。
205+
-- 3. WARN: 确保在运行前users表为空或id序列不冲突,否则可能重复插入。
206+
-- 4. WARN: 邮箱格式为简单模拟,实际环境中可能需要更复杂的逻辑或去重检查。
207+
(1 row)
208+
----
209+
210+
*不区分大小写查询*
211+
212+
[source,sql]
213+
----
214+
SELECT generate_query('show users from beijing, beijing is non-Case insensitive');
215+
----
216+
217+
输出:
218+
219+
----
220+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260211142845878f5f1a5a2f44a7
221+
222+
generate_query
223+
-----------------------------------------
224+
SELECT id, name, email, age, created_at, city, status
225+
FROM public.users
226+
WHERE LOWER(city) = LOWER('beijing') LIMIT 100;
227+
+
228+
-- Explanation:
229+
-- Selects all user details for users located in Beijing, performing a case-insensitive match on the city column.
230+
+
231+
-- Warnings:
232+
-- 1. INFO: Using LOWER() on both sides ensures case-insensitive matching but may prevent the database from using a standard index on the city column if one exists.
233+
-- 2. INFO: Row limit of 100 applied to prevent large result sets.
234+
+
235+
-- Note: Row limit was automatically applied to this query for safety
236+
(1 row)
237+
----
238+
239+
=== explain_query 示例
240+
241+
[source,sql]
242+
----
243+
SELECT explain_query('SELECT * FROM orders WHERE user_id = 12');
244+
----
245+
246+
输出:
247+
248+
----
249+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260211175909d47a6871bcca4897
250+
251+
explain_query
252+
--------------------------------------------------------------------------------------------------------------
253+
1. 查询概述
254+
+
255+
- 该查询旨在从 orders 表中检索 user_id 等于 12 的所有记录(SELECT *)。
256+
- 这是一个典型的根据特定字段(user_id)筛选数据的查询。
257+
+
258+
2. 性能摘要
259+
+
260+
- 总执行时间: 0.021 毫秒
261+
- 规划时间: 0.430 毫秒
262+
- 总成本: 18.12
263+
- 返回行数: 0 行 (Actual Rows: 0)
264+
- 扫描行数: 0 行 (Rows Removed by Filter: 0)
265+
+
266+
3. 执行计划分析
267+
+
268+
- 关键步骤: 顺序扫描
269+
- 数据库对 orders 表执行了全表扫描操作。
270+
- 计划器预计会找到 3 行数据,但实际执行返回了 0 行。
271+
- 过滤条件: orders.user_id = 12,这意味着数据库必须读取表中的每一行来检查这个条件。
272+
+
273+
4. 性能问题
274+
+
275+
- 全表扫描风险: 虽然目前表的数据量很小(执行时间仅为 0.021ms),但使用了 Seq Scan(顺序扫描)意味着数据库没有使用索引。如果 orders 表随着时间推移增长到包含数百万行数据,这种查询方式将变得极其缓慢(高 I/O 消耗)。
276+
- 缺失索引: 计划显示没有使用任何索引来定位 user_id = 12 的行,这表明在 user_id 列上可能缺少必要的 B-Tree 索引。
277+
+
278+
5. 优化建议
279+
+
280+
- 主要建议: 在 user_id 列上创建索引以避免全表扫描。这将把查询从 O(N)(扫描所有行)转变为 O(log N)(索引查找)。
281+
- SQL 优化示例:
282+
+
283+
CREATE INDEX idx_orders_user_id ON orders(user_id);
284+
+
285+
6. 索引建议
286+
+
287+
- 推荐索引: 在 orders 表的 user_id 列上创建 B-Tree 索引。
288+
- 理由: 查询条件基于 user_id 的等值比较 (=)。创建索引后,IvorySQL (PostgreSQL) 将能够利用索引快速定位数据,显著减少查询时间和资源消耗,特别是在数据量大的情况下。
289+
(1 row)
290+
----
291+
292+
== 最佳实践
293+
294+
=== 提示词(Prompt)编写建议
295+
296+
* **使用英语**:虽然 AI 支持多种语言,但英语效果最佳
297+
* **了解数据库结构**:对数据库结构理解越深入,生成的查询越准确
298+
* **迭代优化**:从宽泛的开始,然后逐步添加细节以改进结果
299+
* **明确指定**:如果知道特定的表或列,请在提示中提及,这有助于 AI 生成精确的查询
300+
301+
=== 错误处理示例
302+
303+
当查询中引用的表不存在时,系统会返回错误信息:
304+
305+
[source,sql]
306+
----
307+
SELECT generate_query('列出所有的商品和价格');
308+
----
309+
310+
错误输出:
311+
312+
----
313+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260209135642777cbc5c82ca4a85
314+
315+
ERROR: Query generation failed: Cannot generate query. Referenced table(s) for 'products' or 'goods' do not exist in the database. Available tables: public.orders, public.student_scores, public.users, sys.dual
316+
----
317+
318+
在这种情况下,AI 会告知可用的表列表,帮助用户调整查询。
319+

EN/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
*** xref:master/ecosystem_components/pg_cron.adoc[pg_cron]
3131
*** xref:master/ecosystem_components/pgsql_http.adoc[pgsql-http]
3232
*** xref:master/ecosystem_components/plpgsql_check.adoc[plpgsql_check]
33+
*** xref:master/ecosystem_components/pg_ai_query.adoc[pg_ai_query]
3334
*** xref:master/ecosystem_components/pgroonga.adoc[pgroonga]
3435
*** xref:master/ecosystem_components/pgaudit.adoc[pgaudit]
3536
*** xref:master/ecosystem_components/pgrouting.adoc[pgrouting]

0 commit comments

Comments
 (0)