以下是在IntelliJ IDEA Community Edition中使用Scala创建两个RDD并执行join操作的完整代码示例: ```scala import org.apache.spark.{SparkConf, SparkContext} object RDDJoinExample { def main(args: Array[String]): Unit = { // 创建Spark配置 val conf = new SparkConf() .setAppName("RDD Join Example") .setMaster("local[*]") // 使用本地模式 // 创建SparkContext val sc = new SparkContext(conf) // 创建第一个RDD (key, value) 形式 val rdd1 = sc.parallelize(Seq( (1, "Apple"), (2, "Banana"), (3, "Cherry"), (4, "Date"), (5, "Elderberry"), (6, "Fig"), (7, "Grape"), (8, "Honeydew"), (9, "Iceberg Lettuce"), (10, "Jackfruit") )) // 创建第二个RDD (key, value) 形式 val rdd2 = sc.parallelize(Seq( (1, "Red"), (2, "Yellow"), (3, "Red"), (11, "Green"), (12, "Orange"), (13, "Purple"), (14, "Black"), (15, "White"), (16, "Pink"), (17, "Brown") )) // 执行join操作 val joinedRDD = rdd1.join(rdd2) // 打印结果 println("Join结果:") joinedRDD.collect().foreach(println) // 停止SparkContext sc.stop() } } ``` 这段代码会输出类似这样的结果: ``` (1,(Apple,Red)) (2,(Banana,Yellow)) (3,(Cherry,Red)) ``` 代码说明: 1. 创建了两个RDD,每个包含10条(key,value)数据 2. 两个RDD在key为1、2、3的记录上有交集 3. join操作会基于相同的key进行连接 4. 最终会输出3条匹配的记录 确保你的项目已添加Spark依赖,例如在build.sbt中添加: ```sbt libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.0" ``` [2025-04-11 15:17:08 | AI问答 | 533点数解答]