On Ubuntu 18.04 you can query 

sudo service clickhouse-server

and it will output some info, like main pid, where is the server.log etc.


How to pipe commands from file to clickhouse-client?

cat myfile.txt | clickhouse-client --multiquery

Where is the data stored?


There are extra functions for IP addresses:

SELECT IPv4StringToNum(ip_src)   FROM pmacctflows  LIMIT 10

group by and order example:

SELECT class, ip_src, count(*) FROM pmacctflows  GROUP BY class, ip_src order by count() desc

How to analyze performance?

In order to analyze clickhouse-performance you can use of course their pretty good documentation–  or take hints from here:

  • Clickhouse is optimized for batch insertion. You should avoid many tiny inserts per second. 
  • If you use the merge-tree engine you should carefully choose the index key. Data gets stored ordered lexicographically according to that key. A key can be a column or a subset of columns or expressions.
  • Data in a MergeTree table is stored in “parts”. Each part stores data in the primary key order, so data is ordered lexicographically by the primary key tuple. All the table columns are stored in separate column.bin files in these parts. ” – from the clickhouse documentation.

To see, what is going on under the hood, you can increase the log level: 

clickhouse-client --send_logs_level=trace

I have created the table like this:

ENGINE = MergeTree()
PARTITION BY toYYYYMMDDhhmmss(timestamp_start) ORDER BY(timestamp_start);
CREATE MATERIALIZED VIEW flowdataKafka.pmacctflowstmp_mv to flowdataKafka.pmacctflows

in der clickhouse-server.log kann man dann Informationen über Queries bekommen: Z.B. habe ich gemäß des Zeitstempels sortiert. Dann möchte ich sehen, was passiert, wenn man ein select auführt. Die Log-Datei sagt folgendes:

(SelectExecutor): Selected 396 parts by partition key, 396 parts by primary key, 396 marks by primary key, 396 marks to read from 396 ranges

Was this helpful?

0 / 0