Big Data Tools

Spark Streaming

Here is an example of how to do a very simple streaming from and to file with pyspark. https://gist.github.com/sallos-cyber/a14e03da49cc0c873a651628dba4d096

Spark

spark.dynamicAllocation.enabled spark.dynamicAllocation.initialExecutors spark.dynamicAllocation.minExecutors spark.dynamicAllocation.maxExecutors spark.shuffle.partitions spark.default.parallelism = spark.executor.instances * spark.executor.cores * 2 maxPartitionBytes Input bytes = 40 GB? Wähle so viele Partitions, so dass die Größe einer Partition <= 200

Big Data Tools

Spark Streaming

SPARK configuration

Pyspark streaming from and to csv-file in Zeppelin: basic code example

Kafka Streaming Json using Gson: How to rename a json field? (Actually this is a Gson topic)

Some handy Kafka commands (that I keep forgetting)

How to access your clickhouse database with Spark in Python