在idea中新建工程
删除新项目的src,创建moudle
在父pom中添加spark和scala依赖,我们项目中用scala开发模型,建议scala,开发体验会更好(java、python也可以)
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0
</modelVersion>
<groupId>com.shaozhiqi.bigdata
</groupId>
<artifactId>spark-demo01
</artifactId>
<packaging>pom
</packaging>
<version>1.0-SNAPSHOT
</version>
<modules>
<module>spark-core
</module>
</modules>
<properties>
<maven.compiler.source>1.8
</maven.compiler.source>
<maven.compiler.target>1.8
</maven.compiler.target>
<scala.version>2.11.7
</scala.version>
<spark.version>2.4.3
</spark.version>
<encoding>UTF-8
</encoding>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang
</groupId>
<artifactId>scala-library
</artifactId>
<version>${scala.version}
</version>
</dependency>
<dependency>
<groupId>org.apache.spark
</groupId>
<artifactId>spark-core_2.11
</artifactId>
<version>${spark.version}
</version>
</dependency>
</dependencies>
</project>
在我们Moudle中配置打包插件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>spark-demo01
</artifactId>
<groupId>com.shaozhiqi.bigdata
</groupId>
<version>1.0-SNAPSHOT
</version>
</parent>
<modelVersion>4.0.0
</modelVersion>
<artifactId>spark-core
</artifactId>
<build>
<pluginManagement>
<plugins>
<!-- 编译scala的插件 -->
<plugin>
<groupId>net.alchim31.maven
</groupId>
<artifactId>scala-maven-plugin
</artifactId>
<version>3.2.2
</version>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<groupId>net.alchim31.maven
</groupId>
<artifactId>scala-maven-plugin
</artifactId>
<executions>
<execution>
<id>scala-compile-first
</id>
<phase>process-resources
</phase>
<goals>
<goal>add-source
</goal>
<goal>compile
</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile
</id>
<phase>process-test-resources
</phase>
<goals>
<goal>testCompile
</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins
</groupId>
<artifactId>maven-compiler-plugin
</artifactId>
<executions>
<execution>
<phase>compile
</phase>
<goals>
<goal>compile
</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- 打包插件 -->
<plugin>
<groupId>org.apache.maven.plugins
</groupId>
<artifactId>maven-shade-plugin
</artifactId>
<version>3.2.1
</version>
<configuration>
<transformers>
<!-- add Main-Class to manifest file -->
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<!--you can add you want to need the main class--><!---->
<mainClass>com.shaozhiqi.bigdata.spark.WordCount
</mainClass>
</transformer>
</transformers>
<createDependencyReducedPom>false
</createDependencyReducedPom>
</configuration>
<executions>
<execution>
<phase>package
</phase>
<goals>
<goal>shade
</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*
</artifact>
<excludes>
<exclude>META-INF/*.SF
</exclude>
<exclude>META-INF/*.DSA
</exclude>
<exclude>META-INF/*.RSA
</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
安装scala开发插件到idea
安装后重启
设置scalasdk,选我们新建的moudle
image.png
新建scala对象
编写代码:
def main(args: Array[String]): Unit =
{
//1.创建配置信息
val conf =
new SparkConf().setAppName("wordcount").setMaster("local[*]"
)
//2.创建sparkcontext
val sc=
new SparkContext(conf)
//3.处理业务数据,我们统计每个单词的个数
// 我们要在集群上尝试所以就将textFile的参数参数化,如果在本地执行则写本地的绝对路径
val lines=sc.textFile("G:\\temp\\input.txt"
)
val words=lines.flatMap(_.split(" "
))
val keyMap=words.map((_, 1
))
val result =keyMap.reduceByKey(_+
_)
result.foreach(println)
//4.关闭连接
sc.stop()
}
本地调测试
(1233,1
)
(llll,1
)
(hhh,1
)
(ddd,2
)
(55,2
)
(,1
)
(kkkk,1
)
(jjj,1)
转载于:https://www.cnblogs.com/shaozhiqi/p/11535269.html