一个在Apache License 2.0下发布的Elasticsearch开源JDBC驱动
esqlj是在Elastic Rest High Level API(版本7.11)之上构建的针对Elastic的JDBC驱动。有关Elastic许可证,请参阅 licenses/elastic-licenses
文件夹。
esqlj扩展了SQL语法,提供了全文查询、地理查询、形状查询、连接查询等高级Elastic查询功能。
Sql解析由jsqlparser库 JSQLParser提供。 相关许可证请参阅 licenses/JSqlParser
文件夹。
关于SQL实现,请参见下文“支持矩阵与约定”部分。
尚未达到生产可用级别
已实现DQL DDL和DML实现部分(update, insert, delete, drop)
JDBC URL必须遵循以下语法:
jdbc:esqlj:http<s>://<elastic_address_1>:<elastic_port_1>,http://<elastic_address_2>:<elastic_port_2>,...;param1=paramValue1;...
可以通过逗号分隔的URL列表声明一组连接池。
可选参数:
参数名 | 描述 | 默认值 |
---|---|---|
userName | 用户名 | - |
password | 密码 | - |
includeTextFieldsByDefault | 是否默认包含Elastic text 类型字段在 select * 中 |
false |
indexMetaDataCache | 缓存检索到的索引结构(建议保持此功能启用,因为获取这些信息可能是一项耗时的操作,特别是对于别名或星标索引查询)。对不可变索引来说,最好启用该选项 | true |
maxGroupByRetrievedElements | 对选定字段进行GROUP BY查询时的最大检索元素数 | 500 |
queryScrollFromRows | 第一次分页时获取的行数 | 500 |
queryScrollFetchSize | 下一次分页时获取的行数 | 500 |
queryScrollTimeoutMinutes | 分页之间超时时间,以分钟为单位 | 3 |
queryScrollOnlyByScrollApi | 如果为true,则分页将通过Elastic Scroll API执行。如果为false,将根据查询情况应用最适合的滚动策略(请参阅下方分页段落) | true |
sharedConnection | 如果设置为 true ,esqlj内部使用的Elastic客户端将在所有连接间静态共享(如果您在同一JVM内没有连接到不同Elastic集群的需求时使用此项) |
true |
Elasticsearch的索引被当作SQL表来管理。 Elasticsearch的别名则被当作SQL视图来管理。
包含特殊字符 如 *
, -
, .
的索引进行查询时,需要使用双引号包裹。 例如: 'SELECT * FROM ".test-index*"'
包含特殊字符 -
的字段和别名也必须使用双引号包裹。
Elasticsearch文档标识符"_id"可以在非聚合查询中获取,其类型为 string
并像索引主键一样映射到元数据上。此列也可以在Where条件中用于匹配查询(=、!=)。
搜索评分"_score"在非聚合查询中作为float类型的列返回。
标准SQL过滤语法的功能非常有限。esqlj支持使用Elastic API的全文查询、地理查询、形状查询等自定义语法对文档进行过滤。
目前仅实现了一部分这些高级过滤技术。以下是一个使用 Query string
ELastic API (query-dsl-query-string-query)的例子:
SELECT _id, _score FROM indexName WHERE query_string('(new york city) OR (big apple) OR name:/joh?n(ath[oa]n)/', 'field1, field2,city.*', 'minimum_should_match:2')
默认情况下,可检索的最大文档字段/列数被设置为100。
例如,这就解释了为什么默认情况下包含近500个字段的.kibana_*索引在执行select *时会返回错误。
若要根据您的需求增加此配置阈值,请按照需要更改Elastic设置:'index.max_docvalue_fields_search'
针对以'my-index'开头的索引更改最大文档阈值:
PUT /my-index*/_settings
{
"index" : {
"max_docvalue_fields_search" : 500
}
}
更改所有索引的最大文档阈值:
PUT /*/_settings
{
"index" : {
"max_docvalue_fields_search" : 500
}
}
在未来将不再有可能查询系统索引。
Elastic boolean fields are typed BOOLEAN on resultset. Use constants true
and false
to express conditions on these fields. Example:
SELECT * from \"esqlj-test-static-010\" WHERE booleanField=true
A sample usage of esqlj in DBeaver:
Tables are Elasticsearch indices. Views are Elasticsearch aliases
Index fields
Index documents
Sample SQL query
org.fpasti.jdbc.esqlj.EsDriver
jdbc:esqlj:http://{host}:{port}
esqlj-<rel>-shaded.jar
(available from project build)Add driver dependency in pom.xml:
<dependency>
<groupId>org.fpasti</groupId>
<artifactId>esqlj</artifactId>
<version>0.2.0</version>
</dependency>
DriverManager.registerDriver(new EsDriver());
Connection connection = DriverManager.getConnection("jdbc:esqlj:http://localhost:9200");
Statement stmt = null;
ResultSet rs = null;
try {
stmt = connection.createStatement();
rs = stmt.executeQuery("SELECT * from \"esqlj-test-static-010\" WHERE booleanField=true");
// print out column & fields
ResultSetMetaData rsm = rs.getMetaData();
for(int i = 1; i <= rs.getMetaData().getColumnCount(); i++) {
System.out.println(String.format("%d: Column: %s, Column Alias: %s, Type: %s", i, rsm.getColumnName(i), rsm.getColumnLabel(i), rsm.getColumnTypeName(i)));
}
// iterate over query res
while (rs.next()) {
System.out.println(String.format("_id: %s : doubleField: %f - keywordField: %s - textField: %s - score: %f", rs.getString(10), rs.getDouble(2), rs.getObject(5), rs.getString(8), rs.getFloat(11)));
}
} catch (SQLException ex) {
System.out.println("SQLException: " + ex.getMessage());
} finally {
if(stmt != null) {
stmt.close();
}
if(connection != null) {
connection.close();
}
}
Mapping of supported Elastic types to SQL types:
Elastic on index type | Metadata declared SQL Type | Java effective type |
---|---|---|
boolean | BOOL | Boolean |
date | TIMESTAMP | LocalDateTime |
date_nanos | TIMESTAMP | LocalDateTime |
doc_id | VARCHAR | String |
double | NUMBER | Double |
flattened | STRUCT | Object |
float | NUMBER | Float |
geo_point | STRUCT | EsGeoPoint |
half_float | NUMBER | Float |
integer | NUMBER | Integer |
ip | VARCHAR | String |
keyword | VARCHAR | String |
long | BIGINT | Long |
object | STRUCT | Object |
scaled_float | NUMBER | Float |
short | NUMBER | Byte |
text | VARCHAR | String |
unsigned_long | NUMBER | Long |
wildcard | VARCHAR | String |
默认情况下,esqlj通过Elastic Scroll API在查询上实现滚动策略。可选地,可以激活按顺序进行的更节省资源的滚动方式,但如果您想启用此功能,请注意在每个查询中至少包含一个tiebreaker字段(将来可能不再支持通过文档ID查询,将标识符存储在文档字段中可能成为一个最佳实践)。目前关于在查询结果中引入自动tiebreaker的RFC正在Elastic产品讨论中。但现在如果您启用了此功能而未添加对tiebreaker字段的排序,数据分页之间可能会跳过一些行。
关于按顺序滚动的话题,驱动程序并未使用Point in Time API(似乎Elastic Rest High level API中缺少对此的支持)。
请注意:Scroll API会在服务器端消耗资源。最佳实践是尽快获取所有所需数据。数据检索完成后,esqlj会自动释放滚动链接。
大部分测试单元需要一个实时的Elastic实例。 这些单元的激活由名为“ESQLJ_TEST_CONFIG”的系统变量控制。 环境变量必须连接有效的esqlj JDBC URL连接字符串以及单元内部查询所需文档的加载策略:
ESQLJ_TEST_CONFIG=jdbc:esqlj:http://<elastic_address>:<elastic_port>|<createAndDestroy or createOnly>
Parameters | Actions | Scope |
---|---|---|
createAndDestroy | Test units create index 'esqlj-test-volatile-<uuid>' on start and delete it on finish | Continuous Delivery/Deployment |
createOnly | Test units create index 'esqlj-test-static-<release.version>' and not delete it on finish. If it's just present on Elasticsearch it will be preserved. (Will be required a manual delete of it from system). | Development stage |
Sample configuration: ESQLJ_TEST_CONFIG="jdbc:esqlj:http://10.77.154.32:9080|createOnly"
If ESQLJ_TEST_CONFIG isn't declared, all tests depending from live connection will be skipped.
Actually supported SELECT [...] elements:
Select element | Notes |
---|---|
column |
Elastic document field |
column alias or column AS alias
|
Alias for field in query result |
* |
All document fields |
_id |
document identifier (string) |
_score |
document query search score (float) |
TO_CHAR(field, mask_date) |
Format date field. Example: TO_CHAR(timestampField, 'YYYY/MM/DD HH:MI:SS') . Supported mask: YEAR, YYYY, YY, MM, MONTH, MON, DDD, DD, HH24, HH12, HH, MI, SS, DAY, XFF, FFF, FF, F, PM, TZR, TZH. |
LATITUDE |
Extract latitude from EsGeoPoint field (not orderable) |
LONGITUDE |
Extract longitude from EsGeoPoint field (not orderable) |
COUNT(*) |
Number of documents in index: SELECT COUNT(*) FROM index
|
COUNT(field) |
Number of documents in index where specified field is present and not null: SELECT COUNT("object.keywordObjectField") FROM index
|
COUNT(DISTINCT field) |
Number of distinct values of specified field in index: SELECT COUNT(DISTINCT keywordField) FROM index
|
Supported GROUP BY query functions:
Function | Description | Example |
---|---|---|
AVG |
Average of values | SELECT AVG(integerField) FROM index |
COUNT(*) |
Number of documents | Number of documents in group: SELECT COUNT(*) FROM index
|
COUNT(field) |
Number of documents with specified field | Number of documents in group where specified field is present and not null: SELECT COUNT("object.keywordObjectField") FROM index
|
COUNT(DISTINCT field) |
Number of distinct values | Number of distinct values of specified field: SELECT COUNT(DISTINCT keywordField) FROM index
|
MAX |
Max column value | SELECT MAX(integerField) FROM index |
MIN |
Min column value | SELECT MIN(integerField) FROM index |
SUM |
Sum of values | SELECT SUM(integerField) FROM index |
It's possibile to query distinct values using DISTINCT
clause.
Example:SELECT DISTINCT keywordField, booleanField FROM index ORDER BY keywordField, booleanField DESC
You can use both column name or column alias in expression.
Expression condition | Notes |
---|---|
left expression = value
|
|
left expression != value
|
|
left expression > numeric_value
|
|
left expression >= numeric_value
|
|
left expression < numeric_value
|
|
left expression <= numeric_value
|
|
left expression LIKE expression
|
Implemented by Wildcard Elasticsearch filter. See Elasticsearch documentation about its usage (query-dsl-wildcard-query) |
left expression IS NULL |
|
left expression IS NOT NULL |
|
left expression BETWEEN a AND b
|
a and b could be NUMBER, STRING, date expressed by TO_DATE('date', 'mask_date'), EXTRACT function |
left expression IN (value1 , value2 , ...) |
|
query_string(...) |
Elastic raw query. See below for reference |
geo_bounding_box(...) |
Elastic raw query. See below for reference |
Expression |
---|
column |
alias |
EXTRACT(period from column ) |
value
=column
expression is for example considered invalid from esqlj
esqlj allow you to use specific Elastic query API.
Syntax usage is query_type
(param1
,param2
,...), where query_type
maps specific Elastic query; and param1
,param2
,... allows you to pass parameters to that query.
There are a set of mandatory parameters for every implemented custom query. It's possible also to set optional parameters for changing low level configuration query behaviour, like for example analyze_wildcard
, fuzzy_max_expansions
etc. These configuration settings must to be declared in this way:
query_string('search criteria','field1,field2,object.*','analyze_wildcard:true','fuzzy_max_expansions:15')
.
Esqlj will dynamically cast params value type according to expected parameter Elastic query object.
Currently implemented raw Elastic queries:
Elastic query | query_type | Parameters | Elastic reference |
---|---|---|---|
Query string | query_string | 1: query expression, 2: search on fields (* for all), 3..x: additional query parameters (see Elastic documentation) | string_query |
Geo bounding box | geo_bounding_box | 1: geopoint field, 2: top left latitude, 3: top left: longitude, 4: bottom right latitude, 5: bottom right longitude, 6..x extra params | geo_bounding_box |
Specific Elastic query functions samples
Query type | Sample |
---|---|
query_string | SELECT id, _score FROM indexName WHERE query_string('(new york city) OR (big apple) OR name:/joh?n(ath[oa]n)/', 'field1, field2,city.*', 'minimum_should_match:2') |
geo_bounding_box | SELECT _id, _score FROM indexName WHERE geo_bounding_box('geoPointField', 50, 8, 40.1, 10.2) |
Function name | Admitted on | Notes |
---|---|---|
SYSDATE |
Right expression | Current date time |
SYSDATE() |
Right expression | Current date time |
NOW() |
Right expression | Current date time |
GETDATE() |
Right expression | Current date time |
TRUNC(SYSDATE|SYSDATE()) |
Right expression | Current date |
TO_DATE(date, mask_date) |
Right expression | Supported mask: YEAR, YYYY, YY, MM, MONTH, MON, DDD, DD, HH24, HH12, HH, MI, SS, DAY, XFF, FFF, FF, F, PM, TZR, TZH. Example TO_DATE('2020/01/01', 'YYYY/MM/DD') |
EXTRACT(PERIOD FROM column) |
Left expression | PERIOD can be valued with YEAR , MONTH , DAY , HOUR , MINUTE , SECOND . Usage example: EXTRACT(YEAR FROM timestamp)!=2020 |
It's possible to aggregate values using GROUP BY clause.
Example: SELECT booleanField, AVG(integerField) test, SUM(longField), COUNT(doubleField), COUNT(*) from testIndex GROUP BY booleanField
It's possible to apply filtering on Group by expressions.
Example:
SELECT booleanField, AVG(integerField) test, SUM(longField), COUNT(*) from testIndex GROUP BY booleanField HAVING AVG(integerField)>=4 OR SUM(longField)>=19
Example:
SELECT * FROM column ORDER BY keywordField, integerField DESC
Example:
SELECT * FROM column LIMIT 100
Tested on 7.4.2 and 7.10.0 Elasticsearch release
Fabrizio Pasti
fabrizio.pasti@gmail.com
https://www.linkedin.com/in/fabrizio-pasti-2340a627
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。