{"id":5220,"date":"2022-04-14T16:11:32","date_gmt":"2022-04-14T07:11:32","guid":{"rendered":"https:\/\/www.skyer9.pe.kr\/wordpress\/?p=5220"},"modified":"2022-04-19T15:30:33","modified_gmt":"2022-04-19T06:30:33","slug":"spark-streaming-%ea%b0%9c%eb%b0%9c%ed%99%98%ea%b2%bd-%ea%b5%ac%ec%b6%95%ed%95%98%ea%b8%b0","status":"publish","type":"post","link":"https:\/\/www.skyer9.pe.kr\/wordpress\/?p=5220","title":{"rendered":"Spark Streaming \uac1c\ubc1c\ud658\uacbd \uad6c\ucd95\ud558\uae30"},"content":{"rendered":"<h1>Spark Streaming \uac1c\ubc1c\ud658\uacbd \uad6c\ucd95\ud558\uae30<\/h1>\n<p>Spark Streaming \uac1c\ubc1c\uc5d0 \ud544\uc694\ud55c \ucd5c\uc18c \uc124\uc815\uc73c\ub85c \uac1c\ubc1c\ud658\uacbd\uc744 \uad6c\ucd95\ud569\ub2c8\ub2e4.<br \/>\n\ud558\ub098\uc758 \uc11c\ubc84\uc5d0 \ubaa8\ub4e0 \uc694\uc18c(Kafka + Spark + Hadoop) \uc744 \uc124\uce58\ud569\ub2c8\ub2e4.<br \/>\n\uc131\ub2a5\uc774\ub098 \ubcf4\uc548\uc774\uc288 \ub4f1\uc740 \uace0\ub824\ud558\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4.<\/p>\n<p>JDK \ubc84\uc804 \ubb38\uc81c\ub85c \uacc4\uc815\uc740 2\uac1c\ub97c \uc0dd\uc131\ud569\ub2c8\ub2e4.<\/p>\n<h2>JDK \uc124\uce58<\/h2>\n<pre><code class=\"language-bash\">sudo apt-get update\nsudo apt-get install openjdk-8-jdk\nsudo apt-get install openjdk-11-jdk\n\n# java 8 \uc744 \uc120\ud0dd\ud55c\ub2e4.\nsudo update-alternatives --config java<\/code><\/pre>\n<h2>Kafka \uc124\uce58<\/h2>\n<pre><code class=\"language-bash\">sudo adduser kafka\nsudo su - kafka<\/code><\/pre>\n<pre><code class=\"language-bash\">wget https:\/\/dlcdn.apache.org\/kafka\/2.8.1\/kafka_2.13-2.8.1.tgz\ntar xvfz kafka_2.13-2.8.1.tgz\nmkdir kafka_2.13-2.8.1\/logs\nmkdir kafka_2.13-2.8.1\/data<\/code><\/pre>\n<pre><code class=\"language-bash\">vi .bashrc\n......\nexport JAVA_HOME=\/usr\/lib\/jvm\/java-11-openjdk-amd64\n......<\/code><\/pre>\n<h2>Spark, Hadoop \uc124\uce58<\/h2>\n<pre><code class=\"language-bash\">sudo adduser spark\nsudo su - spark<\/code><\/pre>\n<pre><code class=\"language-bash\">wget https:\/\/dlcdn.apache.org\/hadoop\/common\/hadoop-3.2.3\/hadoop-3.2.3.tar.gz\ntar xvfz hadoop-3.2.3.tar.gz\n\nwget https:\/\/dlcdn.apache.org\/spark\/spark-3.2.1\/spark-3.2.1-bin-hadoop3.2.tgz\ntar xvfz spark-3.2.1-bin-hadoop3.2.tgz<\/code><\/pre>\n<pre><code class=\"language-bash\">vi ~\/.bashrc\nexport JAVA_HOME=\/usr\/lib\/jvm\/java-8-openjdk-amd64\nexport HADOOP_HOME=\/home\/spark\/hadoop-3.2.3\/\nexport SPARK_HOME=\/home\/spark\/spark-3.2.1-bin-hadoop3.2\/\nexport PATH=$JAVA_HOME\/bin:$PATH:$HADOOP_HOME\/bin\/:$HADOOP_HOME\/sbin:$SPARK_HOME\/bin:$SPARK_HOME\/sbin\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME\/lib\/native<\/code><\/pre>\n<pre><code class=\"language-bash\">exit\nsudo su - spark\n\nmkdir -p \/home\/spark\/hadoop\/hadoopdata\/hdfs\/namenode\nmkdir -p \/home\/spark\/hadoop\/hadoopdata\/hdfs\/datanode<\/code><\/pre>\n<pre><code class=\"language-bash\">ssh-keygen -t rsa -P &#039;&#039; -f ~\/.ssh\/id_rsa\ncat ~\/.ssh\/id_rsa.pub &gt;&gt; ~\/.ssh\/authorized_keys\nchmod 0600 ~\/.ssh\/authorized_keys<\/code><\/pre>\n<pre><code class=\"language-bash\">sudo su - hduser\n\nvi $HADOOP_HOME\/etc\/hadoop\/core-site.xml\n......\n## &lt;configuration&gt; &lt;\/configuration&gt; \uc0ac\uc774\uc5d0 \uc785\ub825\n&lt;property&gt;\n   &lt;name&gt;fs.default.name&lt;\/name&gt;\n   &lt;value&gt;hdfs:\/\/localhost:9000&lt;\/value&gt;\n&lt;\/property&gt;\n......<\/code><\/pre>\n<pre><code class=\"language-bash\">vi $HADOOP_HOME\/etc\/hadoop\/hdfs-site.xml\n......\n## &lt;configuration&gt; &lt;\/configuration&gt; \uc0ac\uc774\uc5d0 \uc785\ub825\n&lt;property&gt;\n   &lt;name&gt;dfs.replication&lt;\/name&gt;\n   &lt;value&gt;2&lt;\/value&gt;\n&lt;\/property&gt;\n&lt;property&gt;\n   &lt;name&gt;dfs.namenode.secondary.http-address&lt;\/name&gt;\n   &lt;value&gt;localhost:9868&lt;\/value&gt;\n&lt;\/property&gt;\n&lt;property&gt;\n   &lt;name&gt;dfs.namenode.name.dir&lt;\/name&gt;\n   &lt;value&gt;file:\/home\/spark\/hadoop\/hadoopdata\/hdfs\/namenode&lt;\/value&gt;\n&lt;\/property&gt;\n&lt;property&gt;\n   &lt;name&gt;dfs.datanode.data.dir&lt;\/name&gt;\n   &lt;value&gt;file:\/home\/spark\/hadoop\/hadoopdata\/hdfs\/datanode&lt;\/value&gt;\n&lt;\/property&gt;\n......<\/code><\/pre>\n<pre><code class=\"language-bash\">vi $HADOOP_HOME\/etc\/hadoop\/hadoop-env.sh\n......\nexport JAVA_HOME=\/usr\/lib\/jvm\/java-8-openjdk-amd64\n......<\/code><\/pre>\n<pre><code class=\"language-bash\">hdfs namenode -format\nstart-dfs.sh\nstart-yarn.sh\n\nhadoop fs -mkdir -p \/tmp\nhadoop fs -chmod 777 \/tmp<\/code><\/pre>\n<pre><code class=\"language-bash\">jps\n9664 ResourceManager\n9824 NodeManager\n10130 Jps\n9060 NameNode\n9466 SecondaryNameNode\n9247 DataNode<\/code><\/pre>\n<h2>\uc2e4\ud589 \uba85\ub839\uc5b4<\/h2>\n<pre><code class=\"language-bash\">sudo su - kafka\nkafka_2.13-2.8.1\/bin\/zookeeper-server-start.sh kafka_2.13-2.8.1\/config\/zookeeper.properties &amp;\nkafka_2.13-2.8.1\/bin\/kafka-server-start.sh kafka_2.13-2.8.1\/config\/server.properties &amp;\nexit\n\nsudo su - spark\nstart-dfs.sh\nstart-yarn.sh<\/code><\/pre>\n<h2>\uc885\ub8cc \uba85\ub839\uc5b4<\/h2>\n<pre><code class=\"language-bash\">sudo su - spark\nstop-yarn.sh\nstop-dfs.sh\nexit\n\nsudo su - kafka\nkafka_2.13-2.8.1\/bin\/kafka-server-stop.sh\nkafka_2.13-2.8.1\/bin\/zookeeper-server-stop.sh<\/code><\/pre>\n<h2>Kafka \ucd94\uac00\uc124\uc815<\/h2>\n<pre><code class=\"language-bash\">vi kafka_2.13-2.8.1\/config\/zookeeper.properties\n......\ndataDir=\/home\/kafka\/kafka_2.13-2.8.1\/data\n......<\/code><\/pre>\n<p>Kafka \uc6d0\uaca9 \uc811\uc18d \ud5c8\uc6a9, \ud1a0\ud53d\uc790\ub3d9\uc0dd\uc131 \ube44\ud65c\uc131\ud654, \ud1a0\ud53d\uc0ad\uc81c \ud65c\uc131\ud654<\/p>\n<p><font color=red>AWS EC2 \uc778\uc2a4\ud134\uc2a4\uc778 \uacbd\uc6b0,<br \/>\n\ubcf4\uc548\uadf8\ub8f9\uc5d0 \uc778\uc2a4\ud134\uc2a4 \uacf5\uc778\uc544\uc774\ud53c\uc5d0\uc11c\uc758 \uc811\uc18d\uc744 \ud5c8\uc6a9\ud574 \uc8fc\uc5b4\uc57c \ud55c\ub2e4.<\/font><\/p>\n<blockquote>\n<p>AWS \ubcf4\uc548\uadf8\ub8f9 \uc18c\uc2a4\ub97c \uc774\uc6a9\ud55c \uc811\uc18d\ud5c8\uc6a9\uc740,<br \/>\n\uc9c0\uc815\ub41c \ubcf4\uc548 \uadf8\ub8f9\uacfc \uc5f0\uacb0\ub41c \ub9ac\uc18c\uc2a4\uc758<br \/>\n\ud504\ub77c\uc774\ube57 IP \uc8fc\uc18c\ub97c \uae30\ubc18\uc73c\ub85c \ud558\ub294 \ud2b8\ub798\ud53d\uc774 \ud5c8\uc6a9\ub429\ub2c8\ub2e4.<\/p>\n<\/blockquote>\n<pre><code class=\"language-bash\">sudo su - kafka\nvi kafka_2.13-2.8.1\/config\/server.properties\n......\nadvertised.listeners=PLAINTEXT:\/\/54.180.XXX.XXX:9092\nallow.auto.create.topics=false\ndelete.topic.enable=true\nlog.dirs=\/home\/kafka\/kafka_2.13-2.8.1\/logs\n......<\/code><\/pre>\n<h3>Kafka Topic \uc0dd\uc131<\/h3>\n<pre><code class=\"language-bash\">kafka_2.13-2.8.1\/bin\/kafka-topics.sh --create \\\n    --zookeeper localhost:2181 \\\n    --replication-factor 3 \\\n    --partitions 20 \\\n    --topic test\n\nkafka_2.13-2.8.1\/bin\/kafka-topics.sh --describe \\\n    --zookeeper localhost:2181 \\\n    --topic test\n\nkafka_2.13-2.8.1\/bin\/kafka-topics.sh --list \\\n    --zookeeper localhost:2181\n\nkafka_2.13-2.8.1\/bin\/kafka-topics.sh --alter \\\n    --zookeeper localhost:2181 \\\n    --topic test \\\n    --partitions 40\n\nkafka_2.13-2.8.1\/bin\/kafka-topics.sh --delete \\\n    --zookeeper localhost:2181 \\\n    --topic test<\/code><\/pre>\n<h2>Spark \ucd94\uac00\uc124\uc815<\/h2>\n<h3>Log Level<\/h3>\n<pre><code class=\"language-bash\">sudo su - spark\n\ncp spark-3.2.1-bin-hadoop3.2\/conf\/log4j.properties.template spark-3.2.1-bin-hadoop3.2\/conf\/log4j.properties\nvi spark-3.2.1-bin-hadoop3.2\/conf\/log4j.properties\n......\nlog4j.rootCategory=WARN, console\n......<\/code><\/pre>\n<h2>Hadoop \ucd94\uac00\uc124\uc815<\/h2>\n<p>spark \uacc4\uc815\uc5d0 sudo \uad8c\ud55c \ubd80\uc5ec(\ube44\ubc00\ubc88\ud638 \uc5c6\uc774 \uc2e4\ud589)<\/p>\n<pre><code class=\"language-bash\"># \uc5d0\ub514\ud130 vim \uc120\ud0dd\nsudo update-alternatives --config editor\n\nsudo visudo\n......\nspark ALL=(ALL) NOPASSWD:ALL\n......<\/code><\/pre>\n<h2>Hive \uc124\uce58\ud558\uae30<\/h2>\n<p>\uc6b0\uc120 <a href=\"https:\/\/www.skyer9.pe.kr\/wordpress\/?p=3495\">MariaDB \uc124\uce58<\/a> \ubc0f <a href=\"https:\/\/www.skyer9.pe.kr\/wordpress\/?p=937\">\uacc4\uc815\uc0dd\uc131<\/a> \uc744 \ud574\uc900\ub2e4.<\/p>\n<pre><code class=\"language-bash\">sudo su - spark\n\nwget https:\/\/dlcdn.apache.org\/hive\/hive-3.1.2\/apache-hive-3.1.2-bin.tar.gz\ntar xvfz apache-hive-3.1.2-bin.tar.gz<\/code><\/pre>\n<pre><code class=\"language-bash\">vi ~\/.bashrc\n......\nexport HIVE_HOME=\/home\/spark\/apache-hive-3.1.2-bin\/\nexport PATH=$HIVE_HOME\/bin:$JAVA_HOME\/bin:$HADOOP_HOME\/bin:$HADOOP_HOME\/sbin:$PATH\n\nexit\nsudo su - spark<\/code><\/pre>\n<pre><code class=\"language-bash\">cp $HIVE_HOME\/conf\/hive-env.sh.template $HIVE_HOME\/conf\/hive-env.sh\n\nvi $HIVE_HOME\/conf\/hive-env.sh\n......\nHADOOP_HOME=$HADOOP_HOME\n......<\/code><\/pre>\n<pre><code class=\"language-bash\">cp $HIVE_HOME\/conf\/hive-default.xml.template $HIVE_HOME\/conf\/hive-site.xml\n\nvi $HIVE_HOME\/conf\/hive-site.xml\n......\n## \uc544\ub798\ub97c \ucd94\uac00\ud574\uc8fc\uc138\uc694\n&lt;configuration&gt;\n\n  &lt;property&gt;\n    &lt;name&gt;hive.exec.scratchdir&lt;\/name&gt;\n    &lt;value&gt;\/tmp\/hive-${user.name}&lt;\/value&gt;\n    &lt;description&gt;HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}\/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.&lt;\/description&gt;\n  &lt;\/property&gt;\n  &lt;property&gt;\n    &lt;name&gt;hive.exec.local.scratchdir&lt;\/name&gt;\n    &lt;value&gt;\/tmp\/${user.name}&lt;\/value&gt;\n    &lt;description&gt;Local scratch space for Hive jobs&lt;\/description&gt;\n  &lt;\/property&gt;\n  &lt;property&gt;\n    &lt;name&gt;hive.downloaded.resources.dir&lt;\/name&gt;\n    &lt;value&gt;\/tmp\/${user.name}_resources&lt;\/value&gt;\n    &lt;description&gt;Temporary local directory for added resources in the remote file system.&lt;\/description&gt;\n  &lt;\/property&gt;\n  &lt;property&gt;\n    &lt;name&gt;hive.scratch.dir.permission&lt;\/name&gt;\n    &lt;value&gt;733&lt;\/value&gt;\n    &lt;description&gt;The permission for the user specific scratch directories that get created.&lt;\/description&gt;\n  &lt;\/property&gt;\n  &lt;property&gt;\n    &lt;name&gt;hive.cli.print.current.db&lt;\/name&gt;\n    &lt;value&gt;true&lt;\/value&gt;\n    &lt;description&gt;Whether to include the current database in the Hive prompt.&lt;\/description&gt;\n  &lt;\/property&gt;\n\n&lt;\/configuration&gt;<\/code><\/pre>\n<pre><code class=\"language-bash\">hadoop fs -mkdir -p \/user\/hive\/warehouse\n# hadoop fs -chown -R hive \/user\/hive<\/code><\/pre>\n<pre><code class=\"language-bash\"># bug fix\nrm $HIVE_HOME\/lib\/guava-19.0.jar\ncp $HADOOP_HOME\/share\/hadoop\/hdfs\/lib\/guava-27.0-jre.jar $HIVE_HOME\/lib\/\n\nhive\n\nhive&gt; exit;<\/code><\/pre>\n<h3>\uc2a4\ud0a4\ub9c8\uc0dd\uc131(derby)<\/h3>\n<p>SessionHiveMetaStoreClient \uad00\ub828 \uc624\ub958\uac00 \ubc1c\uc0dd\ud558\uba74,<br \/>\n\uc544\ub798 \uba85\ub839\uc744 \ub2e4\uc2dc \uc2e4\ud589\ud574\uc900\ub2e4.<\/p>\n<pre><code class=\"language-bash\"># rm derby.log\n# rm -rf metastore_db\/\nschematool -initSchema -dbType derby<\/code><\/pre>\n<pre><code class=\"language-bash\">hive\n......\nhive (default)&gt; show tables;\nOK\nTime taken: 0.857 seconds\n......\nhive (default)&gt; CREATE TABLE T1 (ID STRING);\nhive (default)&gt; INSERT INTO T1(ID) VALUES(&#039;aaaaa&#039;);\nhive (default)&gt; SELECT * FROM T1;<\/code><\/pre>\n<h3>MariaDB \uc124\uc815\ud558\uae30<\/h3>\n<p>MySQL \uc6a9 JDBC \ub97c \ub2e4\uc6b4\ubc1b\ub294\ub2e4.(MariaDB XXX)<\/p>\n<pre><code class=\"language-bash\">wget https:\/\/downloads.mysql.com\/archives\/get\/p\/3\/file\/mysql-connector-java-5.1.49.zip\nunzip mysql-connector-java-5.1.49.zip\ncp mysql-connector-java-5.1.49\/mysql-connector-java-5.1.49.jar $HIVE_HOME\/lib\/<\/code><\/pre>\n<pre><code class=\"language-bash\">vi $HIVE_HOME\/conf\/hive-site.xml\n......\n## \uc544\ub798\ub97c \ucd94\uac00\ud574\uc8fc\uc138\uc694\n&lt;configuration&gt;\n\n&lt;property&gt;\n  &lt;name&gt;javax.jdo.option.ConnectionURL&lt;\/name&gt;\n  &lt;value&gt;jdbc:mysql:\/\/localhost:3306\/db_test?createDatabaseIfNotExist=true&lt;\/value&gt;\n  &lt;description&gt;JDBC connection string for a JDBC metastore&lt;\/description&gt;\n&lt;\/property&gt;\n&lt;property&gt;\n  &lt;name&gt;javax.jdo.option.ConnectionDriverName&lt;\/name&gt;\n  &lt;value&gt;com.mysql.jdbc.Driver&lt;\/value&gt;\n  &lt;description&gt;Driver class name for a JDBC metastore&lt;\/description&gt;\n&lt;\/property&gt;\n&lt;property&gt;\n  &lt;name&gt;javax.jdo.option.ConnectionUserName&lt;\/name&gt;\n  &lt;value&gt;testuser&lt;\/value&gt;\n  &lt;description&gt;username to use against metastore database&lt;\/description&gt;\n&lt;\/property&gt;\n&lt;property&gt;\n  &lt;name&gt;javax.jdo.option.ConnectionPassword&lt;\/name&gt;\n  &lt;value&gt;test1234&lt;\/value&gt;\n  &lt;description&gt;password to use against metastore database&lt;\/description&gt;\n&lt;\/property&gt;\n\n&lt;\/configuration&gt;<\/code><\/pre>\n<pre><code class=\"language-bash\">schematool -initSchema -dbType mysql\n\nhive\n......\nhive (default)&gt; show tables;\nOK\nTime taken: 0.665 seconds\n\nhive (default)&gt; CREATE TABLE T1 (ID STRING);\nhive (default)&gt; INSERT INTO T1(ID) VALUES(&#039;aaaaa&#039;);\nhive (default)&gt; SELECT * FROM T1;\nhive (default)&gt; DROP TABLE T1;<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Spark Streaming \uac1c\ubc1c\ud658\uacbd \uad6c\ucd95\ud558\uae30 Spark Streaming \uac1c\ubc1c\uc5d0 \ud544\uc694\ud55c \ucd5c\uc18c \uc124\uc815\uc73c\ub85c \uac1c\ubc1c\ud658\uacbd\uc744 \uad6c\ucd95\ud569\ub2c8\ub2e4. \ud558\ub098\uc758 \uc11c\ubc84\uc5d0 \ubaa8\ub4e0 \uc694\uc18c(Kafka + Spark + Hadoop) \uc744 \uc124\uce58\ud569\ub2c8\ub2e4. \uc131\ub2a5\uc774\ub098 \ubcf4\uc548\uc774\uc288 \ub4f1\uc740 \uace0\ub824\ud558\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4. JDK \ubc84\uc804 \ubb38\uc81c\ub85c \uacc4\uc815\uc740 2\uac1c\ub97c \uc0dd\uc131\ud569\ub2c8\ub2e4. JDK \uc124\uce58 sudo apt-get update sudo apt-get install openjdk-8-jdk sudo apt-get install openjdk-11-jdk # java 8 \uc744 \uc120\ud0dd\ud55c\ub2e4. sudo update-alternatives &#8211;config java\u2026 <span class=\"read-more\"><a href=\"https:\/\/www.skyer9.pe.kr\/wordpress\/?p=5220\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[],"class_list":["post-5220","post","type-post","status-publish","format-standard","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5220","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5220"}],"version-history":[{"count":38,"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5220\/revisions"}],"predecessor-version":[{"id":5310,"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5220\/revisions\/5310"}],"wp:attachment":[{"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5220"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5220"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.skyer9.pe.kr\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5220"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}