Friday, June 10, 2016

How to update Spark on a cluster of PCs?

I feel a necessary to update my Spark on a cluster of nodes because new libraries are added. To upgrade the Spark distribution:

1. For each node, download spark-x.x.x-bin-hadoopx.x.tgz and unzip it
2. Go to folder “conf” in the unzipped file and configure for master and for workers differently:
(1) For worker node(s):
a. Generate file “log4j.properties” from “log4j.properties.template”, change “INFO” to “ERROR” for line “log4j.rootCategory=INFOR, console”
b. Generate file “spark-env.sh” from “spark-env.sh.template”, add the line “export SPARK_MASTER_IP=xxx.xxx.xxx.xxx” to specify master PC's IP address
(2)For master node:
a. Create files “log4j.properties” and “spark-env.sh” exactly as for woker node
b. Generate file “slaves” from “slaves.template”, invalid first line “localhost” by adding an # before it. And add IP addresses of workers below, one line for one worker IP address
3. Test on address "http://master-IP-address:8080/". If everything is fine, you would see a table with worker IDs, addresses, status, number of cores and memories. Their status are marked as "alive".

No comments:

Post a Comment