Configurations for Blaze

Blaze Runtime Parameter

Parameters	Default	Note
spark.blaze.enable	true	Enable/disable blaze engine.
spark.blaze.batchSize	10000	Suggested batch size for arrow batches.
spark.blaze.memoryFraction	0.6	Suggested fraction of off-heap memory used in native execution. Actual off-heap memory usage is expected to be spark.executor.memoryOverhead * fraction.
spark.blaze.tokio.num.worker.threads	1	Number of worker threads used in tokio runtime, 0 to use default available parallelism value. For CPUs those support hyperthreading, it is recommended to set this value to the number of available physical cores.
spark.blaze.enableInputBatchStatistics	true	Enable extra metrics of input batch statistics.
spark.blaze.partialAggSkipping.enable	true	Enable partial aggregate skipping. (see https://github.com/blaze-init/blaze/issues/327)
spark.blaze.partialAggSkipping.ratio	0.8	Partial aggregate skipping ratio.
spark.blaze.partialAggSkipping.minRows	20000	Minimum number of rows to trigger partial aggregate skipping.
spark.blaze.parquet.enable.pageFiltering	false	Parquet enable page filtering.
spark.blaze.parquet.enable.bloomFilter	false	Parquet enable bloom filter.
spark.blaze.forceShuffledHashJoin	false	Replace all sort-merge join to shuffled-hash join, only used for special benchmarking.

Native Operators Switch

Parameters	Default
spark.blaze.enable.scan	true
spark.blaze.enable.project	true
spark.blaze.enable.filter	true
spark.blaze.enable.sort	true
spark.blaze.enable.union	true
spark.blaze.enable.smj	true
spark.blaze.enable.shj	true
spark.blaze.enable.bhj	true
spark.blaze.enable.bnlj	true
spark.blaze.enable.local.limit	true
spark.blaze.enable.global.limit	true
spark.blaze.enable.take.ordered.and.project	true
spark.blaze.enable.aggr	true
spark.blaze.enable.expand	true
spark.blaze.enable.window	true
spark.blaze.enable.generate	true
spark.blaze.enable.local.table.scan	true
spark.blaze.enable.data.writing	false

Expression/UDF switch

Parameters	Default	Note
spark.blaze.enable.caseconvert.functions	true	Enable converting upper/lower functions to native, special cases may provide different outputs from spark due to different unicode versions.
spark.blaze.udf.brickhouse.enabled	true	Enable some native-implemented brickhouse UDFs.
spark.blaze.udf.UDFJson.enabled	true	Enable native implemented get_json_object/json_tuple. May introduce inconsistency in special case (especially with illegal json inputs).