Apache Spark 3.0.0 是 3.x 系列的第一个版本。 该投票于 2020 年 6 月 10 日通过。此版本基于 git tag v3.0.0
,其中包括截至 6 月 10 日的所有提交。 Apache Spark 3.0 基于 Spark 2.x 的许多创新,带来了新思路,并继续开发长期项目。 在开源社区的巨大贡献的帮助下,此版本解决了 3400 多个问题,这是超过 440 名贡献者的贡献的结果。
今年是 Spark 作为开源项目的 10 周年纪念。 自 2010 年首次发布以来,Spark 已发展成为最活跃的开源项目之一。 如今,Spark 是大数据处理、数据科学、机器学习和数据分析工作负载的事实上的统一引擎。
Spark SQL 是此版本中最活跃的组件。 46% 的已解决问题是针对 Spark SQL 的。 这些增强功能使所有更高级别的库受益,包括结构化流和 MLlib,以及更高级别的 API,包括 SQL 和 DataFrames。 此版本中添加了各种相关的优化。 在 TPC-DS 30TB 基准测试中,Spark 3.0 比 Spark 2.4 快大约两倍。
Python 现在是 Spark 上使用最广泛的语言。 PySpark 在 Python 包索引 PyPI 上的每月下载量超过 500 万次。 此版本改进了其功能和可用性,包括使用 Python 类型提示重新设计的 pandas UDF API、新的 pandas UDF 类型以及更 Pythonic 的错误处理。
以下是 Spark 3.0 中的功能亮点:自适应查询执行;动态分区裁剪; ANSI SQL 合规性; pandas API 的显着改进; 结构化流的新 UI; 调用 R 用户定义函数的提速高达 40 倍; 加速器感知调度器; 和 SQL 参考文档。
要下载 Apache Spark 3.0.0,请访问下载页面。 您可以查阅 JIRA 以获取详细更改。 我们在此处整理了一个高级别更改列表,按主要模块分组。
亮点
性能增强
SQL 兼容性增强
spark.sql.ansi.enabled
(SPARK-28989)PySpark 增强
可扩展性增强
连接器增强
spark.sql.statistics.fallBackToHdfs
(SPARK-25474)特性增强
监控和调试增强
文档和测试覆盖率增强
Kubernetes 中的原生 Spark App
其他值得注意的更改
行为变更
请阅读每个组件的迁移指南:Spark Core, Spark SQL, Structured Streaming 和 PySpark.
迁移指南中遗漏的一些其他行为变更
org.apache.spark.sql.streaming.ProcessingTime
已被删除。请改用 org.apache.spark.sql.streaming.Trigger.ProcessingTime
。同样,org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger
已被删除,取而代之的是 Trigger.Continuous
,org.apache.spark.sql.execution.streaming.OneTimeTrigger
已被隐藏,取而代之的是 Trigger.Once
。 (SPARK-28199)DataStreamWriter.foreachBatch
与 Scala 程序不具有源代码兼容性。 您需要更新您的 Scala 源代码以消除 Scala 函数和 Java lambda 之间的歧义。 (SPARK-26132)编程指南:Spark RDD 编程指南 和 Spark SQL、DataFrames 和 Datasets 指南 和 Structured Streaming 编程指南。
亮点
行为变更
请阅读 迁移指南 了解详细信息。
迁移指南中遗漏的一些其他行为变更
编程指南:机器学习库 (MLlib) 指南。
行为变更
请阅读 迁移指南 了解详细信息。
编程指南:SparkR (R on Spark)。
编程指南:GraphX 编程指南。
dropDuplicates
运算符的流式查询可能无法使用 Spark 2.x 编写的检查点重新启动。这将在 Spark 3.0.1 中修复。(SPARK-31990)io.netty.tryReflectionSetAccessible
(SPARK-29923)请注意,如果您使用 S3AFileSystem,例如(“s3a://bucket/path”)在 S3Select 或 SQS 连接器中访问 S3,那么一切都将按预期工作。(SPARK-30968)
to_timestamp
,它使用模式字符串将 datetime 字符串解析为 datetime 值。这将在 Spark 3.0.1 中修复。(SPARK-31939)最后但并非最不重要的是,如果没有以下贡献者,此版本是不可能实现的:Aaruna Godthi, Adam Binford, Adi Muraru, Adrian Tanase, Ajith S, Akshat Bordia, Ala Luszczak, Aleksandr Kashkirov, Alessandro Bellina, Alex Hagerman, Ali Afroozeh, Ali Smesseim, Alon Doron, Aman Omer, Anastasios Zouzias, Anca Sarb, Andre Sa De Mello, Andrew Crosby, Andy Grove, Andy Zhang, Ankit Raj Boudh, Ankur Gupta, Anton Kirillov, Anton Okolnychyi, Anton Yanchenko, Artem Kalchenko, Artem Kupchinskiy, Artsiom Yudovin, Arun Mahadevan, Arun Pandian, Asaf Levy, Attila Zsolt Piros, Bago Amirbekian, Baohe Zhang, Bartosz Konieczny, Behroz Sikander, Ben Ryves, Bo Hai, Bogdan Ghit, Boris Boutkov, Boris Shminke, Branden Smith, Brandon Krieger, Brian Scannell, Brooke Wenig, Bruce Robbins, Bryan Cutler, Burak Yavuz, Carson Wang, Chaerim Yeo, Chakravarthi, Chandni Singh, Chandu Kavar, Chaoqun Li, Chen Hao, Cheng Lian, Chenxiao Mao, Chitral Verma, Chris Martin, Chris Zhao, Christian Clauss, Christian Stuart, Cody Koeninger, Colin Ma, Cong Du, DB Tsai, Dang Minh Dung, Daoyuan Wang, Darcy Shen, Darren Tirto, Dave DeCaprio, David Lewis, David Lindelof, David Navas, David Toneian, David Vogelbacher, David Vrba, David Yang, Deepyaman Datta, Devaraj K, Dhruve Ashar, Dianjun Ma, Dilip Biswal, Dima Kamalov, Dongdong Hong, Dongjoon Hyun, Dooyoung Hwang, Douglas R Colkitt, Drew Robb, Dylan Guedes, Edgar Rodriguez, Edwina Lu, Emil Sandsto, Enrico Minack, Eren Avsarogullari, Eric Chang, Eric Liang, Eric Meisel, Eric Wu, Erik Christiansen, Erik Erlandson, Eyal Zituny, Fei Wang, Felix Cheung, Fokko Driesprong, Fuwang Hu, Gabbi Merz, Gabor Somogyi, Gengliang Wang, German Schiavon Matteo, Giovanni Lanzani, Greg Senia, Guangxin Wang, Guilherme Souza, Guy Khazma, Haiyang Yu, Helen Yu, Hemanth Meka, Henrique Goulart, Henry D, Herman Van Hovell, Hirobe Keiichi, Holden Karau, Hossein Falaki, Huaxin Gao, Huon Wilson, Hyukjin Kwon, Icysandwich, Ievgen Prokhorenko, Igor Calabria, Ilan Filonenko, Ilya Matiach, Imran Rashid, Ivan Gozali, Ivan Vergiliev, Izek Greenfield, Jacek Laskowski, Jackey Lee, Jagadesh Kiran, Jalpan Randeri, James Lamb, Jamison Bennett, Jash Gala, Jatin Puri, Javier Fuentes, Jeff Evans, Jenny, Jesse Cai, Jiaan Geng, Jiafu Zhang, Jiajia Li, Jian Tang, Jiaqi Li, Jiaxin Shan, Jing Chen He, Joan Fontanals, Jobit Mathew, Joel Genter, John Ayad, John Bauer, John Zhuge, Jorge Machado, Jose Luis Pedrosa, Jose Torres, Joseph K. Bradley, Josh Rosen, Jules Damji, Julien Peloton, Juliusz Sompolski, Jungtaek Lim, Junjie Chen, Justin Uang, Kang Zhou, Karthikeyan Singaravelan, Karuppayya Rajendran, Kazuaki Ishizaki, Ke Jia, Keiji Yoshida, Keith Sun, Kengo Seki, Kent Yao, Ketan Kunde, Kevin Yu, Koert Kuipers, Kousuke Saruta, Kris Mok, Lantao Jin, Lee Dongjin, Lee Moon Soo, Li Hao, Li Jin, Liang Chen, Liang Li, Liang Zhang, Liang-Chi Hsieh, Lijia Liu, Lingang Deng, Lipeng Zhu, Liu Xiao, Liu, Linhong, Liwen Sun, Luca Canali, MJ Tang, Maciej Szymkiewicz, Manu Zhang, Marcelo Vanzin, Marco Gaido, Marek Simunek, Mark Pavey, Martin Junghanns, Martin Loncaric, Maryann Xue, Masahiro Kazama, Matt Hawes, Matt Molek, Matt Stillwell, Matthew Cheah, Maxim Gekk, Maxim Kolesnikov, Mellacheruvu Sandeep, Michael Allman, Michael Chirico, Michael Styles, Michal Senkyr, Mick Jermsurawong, Mike Kaplinskiy, Mingcong Han, Mukul Murthy, Nagaram Prasad Addepally, Nandor Kollar, Neal Song, Neo Chien, Nicholas Chammas, Nicholas Marion, Nick Karpov, Nicola Bova, Nicolas Fraison, Nihar Sheth, Nik Vanderhoof, Nikita Gorbachevsky, Nikita Konda, Ninad Ingole, Niranjan Artal, Nishchal Venkataramana, Norman Maurer, Ohad Raviv, Oleg Kuznetsov, Oleksii Kachaiev, Oleksii Shkarupin, Oliver Urs Lenz, Onur Satici, Owen O’Malley, Ozan Cicekci, Pablo Langa Blanco, Parker Hegstrom, Parth Chandra, Parth Gandhi, Patrick Brown, Patrick Cording, Patrick Pisciuneri, Pavithra Ramachandran, Peng Bo, Pengcheng Liu, Petar Petrov, Peter G. Horvath, Peter Parente, Peter Toth, Philipse Guo, Prakhar Jain, Pralabh Kumar, Praneet Sharma, Prashant Sharma, Qi Shao, Qianyang Yu, Rafael Renaudin, Rahij Ramsharan, Rahul Mahadev, Rakesh Raushan, Rekha Joshi, Reynold Xin, Reza Safi, Rob Russo, Rob Vesse, Robert (Bobby) Evans, Rong Ma, Ross Lodge, Ruben Fiszel, Ruifeng Zheng, Ruilei Ma, Russell Spitzer, Ryan Blue, Ryne Yang, Sahil Takiar, Saisai Shao, Sam Tran, Samuel L. Setegne, Sandeep Katta, Sangram Gaikwad, Sanket Chintapalli, Sanket Reddy, Sarth Frey, Saurabh Chawla, Sean Owen, Sergey Zhemzhitsky, Seth Fitzsimmons, Shahid, Shahin Shakeri, Shane Knapp, Shanyu Zhao, Shaochen Shi, Sharanabasappa G Keriwaddi, Sharif Ahmad, Shiv Prashant Sood, Shivakumar Sondur, Shixiong Zhu, Shuheng Dai, Shuming Li, Simeon Simeonov, Song Jun, Stan Zhai, Stavros Kontopoulos, Stefaan Lippens, Steve Loughran, Steven Aerts, Steven Rand, Sujith Chacko, Sun Ke, Sunitha Kambhampati, Szilard Nemeth, Tae-kyeom, Kim, Takanobu Asanuma, Takeshi Yamamuro, Takuya UESHIN, Tarush Grover, Tathagata Das, Terry Kim, Thomas D’Silva, Thomas Graves, Tianshi Zhu, Tiantian Han, Tibor Csogor, Tin Hang To, Ting Yang, Tingbing Zuo, Tom Van Bussel, Tomoko Komiyama, Tony Zhang, TopGunViper, Udbhav Agrawal, Uncle Gen, Vaclav Kosar, Venkata Krishnan Sowrirajan, Viktor Tarasenko, Vinod KC, Vinoo Ganesh, Vladimir Kuriatkov, Wang Shuo, Wayne Zhang, Wei Zhang, Weichen Xu, Weiqiang Zhuang, Weiyi Huang, Wenchen Fan, Wenjie Wu, Wesley Hoffman, William Hyun, William Montaz, William Wong, Wing Yew Poon, Woudy Gao, Wu, Xiaochang, XU Duo, Xian Liu, Xiangrui Meng, Xianjin YE, Xianyang Liu, Xianyin Xin, Xiao Li, Xiaoyuan Ding, Ximo Guanter, Xingbo Jiang, Xingcan Cui, Xinglong Wang, Xinrong Meng, XiuLi Wei, Xuedong Luan, Xuesen Liang, Xuewen Cao, Yadong Song, Yan Ma, Yanbo Liang, Yang Jie, Yanlin Wang, Yesheng Ma, Yi Wu, Yi Zhu, Yifei Huang, Yiheng Wang, Yijie Fan, Yin Huai, Yishuang Lu, Yizhong Zhang, Yogesh Garg, Yongjin Zhou, Yongqiang Chai, Younggyu Chun, Yuanjian Li, Yucai Yu, Yuchen Huo, Yuexin Zhang, Yuhao Yang, Yuli Fiterman, Yuming Wang, Yun Zou, Zebing Lin, Zhenhua Wang, Zhou Jiang, Zhu, Lipeng, codeborui, cxzl25, dengziming, deshanxiao, eatoncys, hehuiyuan, highmoutain, huangtianhua, liucht-inspur, mob-ai, nooberfsh, roland1982, teeyog, tools4origins, triplesheep, ulysses-you, wackxu, wangjiaochun, wangshisan, wenfang6, wenxuanguan