[WIP][ML][SPARK-51379] Move treeAggregate's final aggregation from driver to executor #50142

zhengruifeng · 2025-03-04T01:15:43Z

What changes were proposed in this pull request?

Move treeAggregate's final aggregation from driver to executor.

ee20fbb introduced an optimization that:

Move final iteration of aggregation of RDD.treeAggregate to an executor with one partition and fetch that result to the driver

This PR tries to apply this optimization so that less memory is required for ML's treeAggregate.

Why are the changes needed?

to save driver memory

Does this PR introduce any user-facing change?

no

How was this patch tested?

ci and manually check

preparing data:

df = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
for i in range(10):
    df = df.union(df)

df.count()
df.repartition(1024).write.mode("overwrite").parquet("/tmp/test_data")

training a lr

from pyspark.ml.classification import *
df = spark.read.parquet("/tmp/test_data")
lr = LogisticRegression()
model = lr.fit(df)

before:

after:

In each iteration, the data sent to driver is reduced from 136.1 KiB to 21.3 KiB

Was this patch authored or co-authored using generative AI tooling?

no

github-actions bot added ML MLLIB labels Mar 4, 2025

test

f0f3930

zhengruifeng force-pushed the ml_final_agg branch from 731c818 to f0f3930 Compare March 4, 2025 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][ML][SPARK-51379] Move treeAggregate's final aggregation from driver to executor #50142

[WIP][ML][SPARK-51379] Move treeAggregate's final aggregation from driver to executor #50142

zhengruifeng commented Mar 4, 2025 •

edited

Loading

[WIP][ML][SPARK-51379] Move treeAggregate's final aggregation from driver to executor #50142

Are you sure you want to change the base?

[WIP][ML][SPARK-51379] Move treeAggregate's final aggregation from driver to executor #50142

Conversation

zhengruifeng commented Mar 4, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

zhengruifeng commented Mar 4, 2025 •

edited

Loading