PostgreSQL插件之TimescaleDB

TimescaleDB

TimescaleDB是一个开源的PostgreSQL的时序数据库扩展。基于PostgreSQL的强大能力,TimescaleDB能够提供跨时间和空间的自动分区以及完整的SQL支持。

TimescaleDB既像传统的关系型数据库一样支持全量SQL,又像NoSQL型数据库一样可扩展。它具有如下的优势:

简单易用

  •     PostgreSQL原生支持的所有的SQL接口(包括二级索引,基于非时间的聚合,子查询,JOIN,窗口函数)
  •     无需更改任何连接和使用PostgreSQL的客户端和工具
  •     面向时间的功能、API函数以及优化
  •    强大的数据保存策略的支持

扩展性强

  •     透明的时间/空间分区功能,可进行向上和向外扩展
  •     高数据写入率(包括批处理提交,内存索引,事务支持,对数据回填的支持)
  •     单个节点上大小合适的块(二维数据分区)可确保即使在大数据量下也能快速提取
  •     跨块和服务器的并行操作

可靠性高

  •     基于可靠的PostgreSQL
  •     受益于PostgreSQL 20多年研究(包括流复制,备份)的基础
  •     灵活的管理选项(与现有的PostgreSQL生态系统和工具兼容)

更多关于TimescaleDB的信息,请参考[官方文档](https://docs.timescale.com/latest/main)

系统要求

  • PostgreSQL 9.6+

使用说明

创建插件

CREATE EXTENSION IF NOT EXISTS timescaledb;

删除插件

DROP EXTENSION IF EXISTS timescaledb;

使用示例1

1、创建普通测试表

postgres=# CREATE TABLE conditions (
postgres(#   time        TIMESTAMPTZ       NOT NULL,
postgres(#   location    TEXT              NOT NULL,
postgres(#   temperature DOUBLE PRECISION  NULL,
postgres(#   humidity    DOUBLE PRECISION  NULL
postgres(# );
CREATE TABLE

2、基于time分区将上一步创建的普通表转换为超表

postgres=# SELECT create_hypertable('conditions', 'time');
    create_hypertable    
-------------------------
 (1,public,conditions,t)
(1 row)

postgres=# 
  • 关于超表概念以及如何创建超表,请参考超表和创建超表

3、插入数据并查询

postgres=# INSERT INTO conditions(time, location, temperature, humidity)
SELECT now(), to_char(i, 'FM0000'), random()*i, random()*i FROM generate_series(1,10000) i;
INSERT 0 10000
postgres=# SELECT * FROM conditions ORDER BY time DESC LIMIT 10;
             time              | location |    temperature     |      humidity      
-------------------------------+----------+--------------------+--------------------
 2021-02-04 14:25:05.861412+08 | 0001     | 0.8499118449648186 | 0.7081022857469108
 2021-02-04 14:25:05.861412+08 | 0002     |  1.032869253930933 | 0.6919497591141521
 2021-02-04 14:25:05.861412+08 | 0003     |  2.181961480555376 |  2.798874636756544
 2021-02-04 14:25:05.861412+08 | 0004     | 2.0593786791351505 | 3.1111923049135726
 2021-02-04 14:25:05.861412+08 | 0005     |  4.631874193357941 |  1.152362876862405
 2021-02-04 14:25:05.861412+08 | 0006     | 3.1238939110304145 | 3.5332473749587052
 2021-02-04 14:25:05.861412+08 | 0007     | 3.0055947001481975 |  2.681226072554999
 2021-02-04 14:25:05.861412+08 | 0008     |   5.97665218304374 |  4.776195580392255
 2021-02-04 14:25:05.861412+08 | 0009     | 0.4395944889294263 |   3.79820483119601
 2021-02-04 14:25:05.861412+08 | 0010     |  5.213994089740517 |  6.613006901233973
(10 rows)

---针对过去3小时的数据,每15分钟采集度量一次,按照时间和温度降序排序
postgres=# SELECT time_bucket('15 minutes', time) AS fifteen_min,
postgres-#     location, COUNT(*),
postgres-#     MAX(temperature) AS max_temp,
postgres-#     MAX(humidity) AS max_hum
postgres-#   FROM conditions
postgres-#   WHERE time > NOW() - interval '3 hours'
postgres-#   GROUP BY fifteen_min, location
postgres-#   ORDER BY fifteen_min DESC, max_temp DESC;
      fifteen_min       | location | count |       max_temp       |        max_hum         
------------------------+----------+-------+----------------------+------------------------
 2021-02-04 14:15:00+08 | 9974     |     1 |    9959.154515189259 |      8393.882609488966
 2021-02-04 14:15:00+08 | 9963     |     1 |    9893.781627907236 |      7968.817036613598
 2021-02-04 14:15:00+08 | 9852     |     1 |    9831.658490260146 |      4594.882403274832
 2021-02-04 14:15:00+08 | 9962     |     1 |    9668.102999374136 |      9060.332750304518
 2021-02-04 14:15:00+08 | 9807     |     1 |     9632.48883539376 |      6923.888752408443
 2021-02-04 14:15:00+08 | 9627     |     1 |    9613.157259692176 |      9241.248136652985
 2021-02-04 14:15:00+08 | 9800     |     1 |    9590.790872497493 |        2464.9034796248
 2021-02-04 14:15:00+08 | 9600     |     1 |    9537.116866999975 |     3404.4305571978384
 2021-02-04 14:15:00+08 | 9954     |     1 |    9523.032966159419 |      1555.249976678025
 2021-02-04 14:15:00+08 | 9908     |     1 |    9499.432451454542 |      1110.053567264063
 2021-02-04 14:15:00+08 | 9806     |     1 |    9481.471221781818 |      8187.133268036299
 2021-02-04 14:15:00+08 | 9552     |     1 |    9378.450791195617 |      5158.115086741863
 2021-02-04 14:15:00+08 | 9451     |     1 |    9371.378793857662 |     2626.3288166080547
 2021-02-04 14:15:00+08 | 9654     |     1 |    9340.183911422057 |      8080.395366505775
 2021-02-04 14:15:00+08 | 9419     |     1 |    9337.334558829742 |      1577.535230574912
 2021-02-04 14:15:00+08 | 9929     |     1 |    9335.892011237425 |      7598.584466488536
 2021-02-04 14:15:00+08 | 9803     |     1 |    9333.235219857552 |      6644.544463200405
 2021-02-04 14:15:00+08 | 9899     |     1 |     9327.24793091667 |     1563.9061617953776
 2021-02-04 14:15:00+08 | 9750     |     1 |     9327.22125169839 |     3425.3518007559906
 2021-02-04 14:15:00+08 | 9914     |     1 |    9326.780445982797 |     7394.6374437081995
 2021-02-04 14:15:00+08 | 9872     |     1 |    9302.263030958715 |     1101.1974349270733
 2021-02-04 14:15:00+08 | 9581     |     1 |    9277.185015285479 |      6663.525557202216
 2021-02-04 14:15:00+08 | 9708     |     1 |    9273.063941424782 |      3175.170956246176
 2021-02-04 14:15:00+08 | 9848     |     1 |    9260.316420825136 |      3523.155521805003
 2021-02-04 14:15:00+08 | 9632     |     1 |    9259.105259260195 |      227.1827826256217
 2021-02-04 14:15:00+08 | 9466     |     1 |     9254.85035699184 |     2616.3724030114145
 2021-02-04 14:15:00+08 | 9860     |     1 |     9247.49280995008 |      9379.583724363356
 2021-02-04 14:15:00+08 | 9235     |     1 |    9226.944980400396 |     1061.4152797323275
 ......
  • 关于time_bucket()等TimescaleDB提供的API函数,请参考API Functions

使用示例2

  • 该示例演示了如何分析纽约市出租车的运营情况,关于示例数据及更多信息介绍,请参考官方Tutorial

1、创建示例数据库及环境

postgres=# CREATE DATABASE nyc_data;
CREATE DATABASE
postgres=# \c nyc_data
You are now connected to database "nyc_data" as user "pg12".
nyc_data=# CREATE EXTENSION IF NOT EXISTS timescaledb;

2、创建普通测试表、超表及索引

nyc_data=# CREATE TABLE "rides"(
nyc_data(#     vendor_id TEXT,
nyc_data(#     pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
nyc_data(#     dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
nyc_data(#     passenger_count NUMERIC,
nyc_data(#     trip_distance NUMERIC,
nyc_data(#     pickup_longitude  NUMERIC,
nyc_data(#     pickup_latitude   NUMERIC,
nyc_data(#     rate_code         INTEGER,
nyc_data(#     dropoff_longitude NUMERIC,
nyc_data(#     dropoff_latitude  NUMERIC,
nyc_data(#     payment_type INTEGER,
nyc_data(#     fare_amount NUMERIC,
nyc_data(#     extra NUMERIC,
nyc_data(#     mta_tax NUMERIC,
nyc_data(#     tip_amount NUMERIC,
nyc_data(#     tolls_amount NUMERIC,
nyc_data(#     improvement_surcharge NUMERIC,
nyc_data(#     total_amount NUMERIC
nyc_data(# );
CREATE TABLE
nyc_data=# SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE);
 create_hypertable  
--------------------
 (1,public,rides,t)
(1 row)

nyc_data=# CREATE INDEX ON rides (vendor_id, pickup_datetime DESC);
CREATE INDEX
nyc_data=# CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
CREATE INDEX
nyc_data=# CREATE INDEX ON rides (passenger_count, pickup_datetime DESC);
CREATE INDEX

3、创建辅助表并插入数据

nyc_data=# CREATE TABLE IF NOT EXISTS "payment_types"(
nyc_data(#     payment_type INTEGER,
nyc_data(#     description TEXT
nyc_data(# );
CREATE TABLE
nyc_data=# INSERT INTO payment_types(payment_type, description) VALUES
nyc_data-# (1, 'credit card'),
nyc_data-# (2, 'cash'),
nyc_data-# (3, 'no charge'),
nyc_data-# (4, 'dispute'),
nyc_data-# (5, 'unknown'),
nyc_data-# (6, 'voided trip');
INSERT 0 6
nyc_data=# CREATE TABLE IF NOT EXISTS "rates"(
nyc_data(#     rate_code   INTEGER,
nyc_data(#     description TEXT
nyc_data(# );
CREATE TABLE
nyc_data=# INSERT INTO rates(rate_code, description) VALUES
nyc_data-# (1, 'standard rate'),
nyc_data-# (2, 'JFK'),
nyc_data-# (3, 'Newark'),
nyc_data-# (4, 'Nassau or Westchester'),
nyc_data-# (5, 'negotiated fare'),
nyc_data-# (6, 'group ride');
INSERT 0 6

4、向测试表导入测试数据

nyc_data=# \COPY rides FROM nyc_data_rides.csv CSV;
COPY 10906860

5、分析处理数据

--- 2016年1月每天都多少次打车行为
nyc_data=# SELECT date_trunc('day', pickup_datetime) as day, COUNT(*) FROM rides GROUP BY day ORDER BY day;
         day         | count  
---------------------+--------
 2016-01-01 00:00:00 | 345037
 2016-01-02 00:00:00 | 312831
 2016-01-03 00:00:00 | 302878
 2016-01-04 00:00:00 | 316171
 2016-01-05 00:00:00 | 343251
 2016-01-06 00:00:00 | 348516
 2016-01-07 00:00:00 | 364894
 2016-01-08 00:00:00 | 392070
 2016-01-09 00:00:00 | 405825
 2016-01-10 00:00:00 | 351788
 2016-01-11 00:00:00 | 342651
 2016-01-12 00:00:00 | 367390
 2016-01-13 00:00:00 | 395090
 2016-01-14 00:00:00 | 396473
 2016-01-15 00:00:00 | 401289
 2016-01-16 00:00:00 | 411899
 2016-01-17 00:00:00 | 379156
 2016-01-18 00:00:00 | 341481
 2016-01-19 00:00:00 | 385187
 2016-01-20 00:00:00 | 382105
 2016-01-21 00:00:00 | 399654
 2016-01-22 00:00:00 | 420162
 2016-01-23 00:00:00 |  78133
 2016-01-24 00:00:00 | 159766
 2016-01-25 00:00:00 | 282087
 2016-01-26 00:00:00 | 327655
 2016-01-27 00:00:00 | 359180
 2016-01-28 00:00:00 | 383326
 2016-01-29 00:00:00 | 414039
 2016-01-30 00:00:00 | 435369
 2016-01-31 00:00:00 | 361505
 2017-11-17 00:00:00 |      2
(32 rows)

---前7天平均每个顾客的打车费是多少
nyc_data=# SELECT date_trunc('day', pickup_datetime) AS day, avg(fare_amount)
nyc_data-#     FROM rides WHERE passenger_count = 1 AND pickup_datetime < '2016-01-08'
nyc_data-#     GROUP BY day ORDER BY day;
         day         |         avg         
---------------------+---------------------
 2016-01-01 00:00:00 | 12.5464748850129787
 2016-01-02 00:00:00 | 12.1129878886746750
 2016-01-03 00:00:00 | 12.8262352076841150
 2016-01-04 00:00:00 | 11.9116533573721472
 2016-01-05 00:00:00 | 11.7534235580737452
 2016-01-06 00:00:00 | 11.7824805635293235
 2016-01-07 00:00:00 | 11.9498961299166930
(7 rows)

---一个月内每种费率类型的次数是多少
nyc_data=# SELECT rate_code, COUNT(vendor_id) AS num_trips 
nyc_data-#     FROM rides WHERE pickup_datetime < '2016-02-01'
nyc_data-#     GROUP BY rate_code ORDER BY rate_code;
 rate_code | num_trips 
-----------+-----------
         1 |  10626315
         2 |    225019
         3 |     16822
         4 |      4696
         5 |     33688
         6 |       102
        99 |       216
(7 rows)

---2016年第一天每5分钟有多少次乘车?
nyc_data=# SELECT time_bucket('5 minute', pickup_datetime) AS five_min, count(*)
nyc_data-#     FROM rides WHERE pickup_datetime < '2016-01-02 00:00'
nyc_data-#     GROUP BY five_min ORDER BY five_min;
      five_min       | count 
---------------------+-------
 2016-01-01 00:00:00 |   703
 2016-01-01 00:05:00 |  1482
 2016-01-01 00:10:00 |  1959
 2016-01-01 00:15:00 |  2200
 2016-01-01 00:20:00 |  2285
 2016-01-01 00:25:00 |  2291
 2016-01-01 00:30:00 |  2349
 2016-01-01 00:35:00 |  2328
 2016-01-01 00:40:00 |  2440
 2016-01-01 00:45:00 |  2372
 2016-01-01 00:50:00 |  2388
 2016-01-01 00:55:00 |  2473
 2016-01-01 01:00:00 |  2395
 2016-01-01 01:05:00 |  2510
 2016-01-01 01:10:00 |  2412
 2016-01-01 01:15:00 |  2482
 2016-01-01 01:20:00 |  2428
 2016-01-01 01:25:00 |  2433
 2016-01-01 01:30:00 |  2337
 2016-01-01 01:35:00 |  2366
 2016-01-01 01:40:00 |  2325
 2016-01-01 01:45:00 |  2257
 2016-01-01 01:50:00 |  2316
 2016-01-01 01:55:00 |  2250
 2016-01-01 02:00:00 |  2303
 2016-01-01 02:05:00 |  2259
 2016-01-01 02:10:00 |  2253
 2016-01-01 02:15:00 |  2176
 2016-01-01 02:20:00 |  2138
 ......
合智互联客户成功服务热线:400-1565-661

admin
admin管理员

上一篇:【文末有福利】数仓性能调优必读,带你进阶为性能调优高手
下一篇:万物皆可联,牛年“牛”体验

留言评论

暂无留言
取消
扫码支持