Spark

Spark Streaming

Here is an example of how to do a very simple streaming from and to file with pyspark. https://gist.github.com/sallos-cyber/a14e03da49cc0c873a651628dba4d096

Spark

spark.dynamicAllocation.enabled spark.dynamicAllocation.initialExecutors spark.dynamicAllocation.minExecutors spark.dynamicAllocation.maxExecutors spark.shuffle.partitions spark.default.parallelism = spark.executor.instances * spark.executor.cores * 2 maxPartitionBytes Input bytes = 40 GB? Wähle so viele Partitions, so dass die Größe einer Partition <= 200

Spark

Spark Streaming

SPARK configuration

Pyspark streaming from and to csv-file in Zeppelin: basic code example

How to access your clickhouse database with Spark in Python

Link to book “Spark The Definite Guide”