TT-series
High performance engine to store Time series data in Redis.
TT-series is based on redis sorted sets to store the time-series data, Sorted set store scores with unique numbers under a single key, but it has a weakness to store records, only unique members are allowed and trying to record a time-series entry with the same value as a previous will result in only updating the score. So TT-series provide a solution to solve that problem.
TT series normally can support redis version > 3.0, and will support redis 5.0 in the future.
Tips
-
Max Store series length
For 32 bit Redis on a 32 bit platform redis sorted sets can support maximum 2**32-1 members, and for 64 bit redis on a 64 bit platform can support maximum 2**64-1 members. But large amount of data would cause more CPU activity, so better keep a balance with length of records is very important.
-
Only Support Python 3.6
Because python 3.6 changed the dictionary implementation for better performance, so in Python 3.6 dictionaries are insertion ordered. links: https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6
-
Performance Tips
With hiredis-py which it's targeted at speeding up parsing multi bulk replies from redis-server. So with a large amount of bulk data insertion or getting from redis-server, it can improve a great performance improvement.
Install
Install python package from pip release:
pip install ttseries
Documentation
Features
- Quick inserts 100,000 records (1-2 sec) and get slice of 100,000 records (0.4-0.5 sec).
- Support Data Serializer, Default Enable with MessagePack.
- Support Data Compression.
- In-order to void update previous records, Support Redis Hashes Time-Series storage format.
- Support Numpy ndarray data type.
- Support max length to auto to trim records.
Usage
TT-series provide three implementation to support different kinds of time-series data type.
-
RedisSimpleTimeSeries
: Normally only base on Sorted sets to store records, previous records will impact the new inserting records which are NOT unique numbers. -
RedisHashTimeSeries
: Use Redis Sorted sets with Hashes to store time-series data, User don't need to consider the data repeatability with records, but sorted sets with hashes would take some extra memories to store the keys. -
RedisNumpyTimeSeries
: Supportnumpy.ndarray
to store time-series records in redis sorted set. -
RedisPandasTimeSeries
: Supportpandas.DataFrame
to store time-series records in redis sorted set.
Serializer Data
TT-series use MsgPack to serializer data, because compare with other data serializer's solutions,
MsgPack provide a better performance solution to serialize data. If user don't want to use MsgPack to
serializer data, just inherit from ttseries.BaseSerializer
class to implement the supported
serializer class methods.
Examples
RedisSimpleTimeSeries
&& RedisHashTimeSeries
&& RedisNumpyTimeSeries
&& RedisPandasTimeSeries
Three series data implementation provide the same functions and methods, in the usage will provide the difference in the methods.
Prepare data records:
from datetime import datetime
from redis import StrictRedis
now = datetime.now()
timestamp = now.timestamp()
series_data = []
for i in range(1000):
series_data.append((timestamp+i,i))
client = StrictRedis() # redis client
Add records
from ttseries import RedisSimpleTimeSeries
simple_series = RedisSimpleTimeSeries(client=client)
key = "TEST:SIMPLE"
simple_series.add_many(key, series_data)
Count records length
Get the length of the records or need just get the length from timestamp span.
# get the records length
simple_series.length(key)
# result: ...: 1000
# get the records length from start timestamp and end timestamp
simple_series.count(key, from_timestamp=timestamp, end_timestamp=timestamp+10)
# result: ...: 11
trim records
Trim the records as the ASC.
simple_series.trim(key,10) # trim 10 length of records
delete timestamp span
Delete timestamp provide delete key or delete records from start timestamp to end timestamp.
simple_series.delete(key) # delete key with all records
simple_series.delete(key, start_timestamp=timestamp) # delete key form start timestamp
Get Slice
Get slice form records provide start timestamp and end timestamp with ASC or DESC ordered.
Default Order: ASC
If user want to get the timestamp great than (>) or less than (<) which not including the timestamp record.
just use (timestamp
which support <timestamp
or >timestamp
sign format like this.
# get series data from start timestamp ordered as ASC.
simple_series.get_slice(key, start_timestamp=timestamp, acs=True)
# get series data from great than start timestamp order as ASC
simple_series.get_slice(key, start_timestamp="("+str(timestamp), asc=True)
# get series data from start timestamp and limit the numbers with 500
time_series.get_slice(key,start_timestamp=timestamp,limit=500)
iter
yield item from records.
for item in simple_series.iter(key):
print(item)
RedisNumpyTimeSeries
Numpy array support provide numpy.dtype
or just arrays with data.
Use numpy.dtype
to create records. must provide timestamp_column_name
and dtype
parameters.
import numpy as np
from ttseries import RedisNumpyTimeSeries
dtype = [("timestamp","float64"),("value","i")]
array = np.array(series_data, dtype=dtype)
np_series = RedisNumpyTimeSeries(client=client, dtype=dtype, timestamp_column_name="timestamp")
Or just numpy array without dtype, but must provide timestamp_column_index
parameter.
array = np.array(series_data)
np_series = RedisNumpyTimeSeries(client=client,timestamp_column_index=0)
RedisPandasTimeSeries
Pandas TimeSeries use pandas.DataFrame
to store time-series in redis.
To initialize the class must provide columns
and dtypes
parameters.
-
columns
parameter indicates the column names of thepandas.DataFrame
. -
dtypes
indicates the dtype of each column in DataFrame, for example:{ "value1":"int64","value2":"float32"}
reference link: http://pbpython.com/pandas_dtypes.html
from datetime import datetime
key = "AA:MIN"
now = datetime.now()
columns = ["value"]
date_range = pandas.date_range(now, periods=10, freq="1min")
data_frame = pandas.DataFrame([i + 1 for i in range(len(date_range))],
index=date_range, columns=columns)
dtypes = {"value":"int64"}
pandas_ts = RedisPandasTimeSeries(client=client, columns=columns, dtypes=dtypes)
Add
Add a time-series record to redis, series
parameter indicates pandas.Series
data type.
and especial the series
name value data type must be the pandas.DatetimeIndex
.
series_item = data_frame.iloc[0]
pandas_ts.add(key, series_item)
add_many
Add large amount of pandas.DataFrame
into redis, with the dataframe
index data type must be
the pandas.DatetimeIndex
.
For better insert performance, just use chunks_size
to split the dataframe into fixed chunks_size
rows of dataframes.
pandas_ts.add_many(key, data_frame)
iter & Get
retrieve records from redis sorted set, both of methods return pandas.Series
.
# yield all records data from redis
for item in pandas_ts.iter(key):
print(item)
# return one record with specific timestamp
pandas_ts.get(key, 1536157765.464465)
get_slice
retrieve records to slice with start timestamp
or end timestamp
, with limit
length.
return pandas.DataFrame
# return records from start timestamp 1536157765.464465
result_frame = pandas_ts.get_slice(key, start_timestamp=1536157765.464465)
# return records from start timestamp 1536157765.464465 to end timestamp 1536157780.464465
result2_frame = padas_ts.get_slice(key, start_timestamp=1536157765.464465, end_timestamp=1536157780.464465)
Benchmark
just run make benchmark-init
, after then start make benchmark-test
.
Go to the benchmark directory there exist an example of the benchmark test reports.
TODO
- Support Redis 5.0
- Support Redis cluster.
Author
- Winton Wang
Donate
Contact
Email: 365504029@qq.com