CCA175 Questions and Answers

Question # 6

Problem Scenario 86 : In Continuation of previous question, please accomplish following activities.

1. Select Maximum, minimum, average , Standard Deviation, and total quantity.

2. Select minimum and maximum price for each product code.

3. Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average and Standard deviation will have maximum two decimal values.

4. Select all the product code and average price only where product count is more than or equal to 3.

5. Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.

Full Access

Question # 7

Problem Scenario 25 : You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)

sex,name,city

1,alok,mumbai

1,jatin,chennai

1,yogesh,kolkata

2,ragini,delhi

2,jyotsana,pune

1,valmiki,banglore

Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1

(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.

Flumemaleemployee1 : will contain only male employees data flumefemaleemployee1 : Will contain only woman employees data

Full Access

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation:

Solution :

Step 1 : Create hive table for flumemaleemployeel and .'

CREATE TABLE flumemaleemployeel

(

sex_type int, name string, city string )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

CREATE TABLE flumefemaleemployeel

(

sex_type int, name string, city string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

Step 2 : Create below directory and file mkdir /home/cloudera/flumetest/ cd /home/cloudera/flumetest/

Step 3 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume5.conf.

agent.sources = tailsrc

agent.channels = mem1 mem2

agent.sinks = stdl std2

agent.sources.tailsrc.type = exec

agent.sources.tailsrc.command = tail -F /home/cloudera/flumetest/in.txt

agent.sources.tailsrc.batchSize = 1

agent.sources.tailsrc.interceptors = i1 agent.sources.tailsrc.interceptors.i1.type = regex_extractor agent.sources.tailsrc.interceptors.il.regex = A(\\d} agent.sources.tailsrc. interceptors. M.serializers = t1 agent.sources.tailsrc. interceptors, i1.serializers.t1. name = type

agent.sources.tailsrc.selector.type = multiplexing agent.sources.tailsrc.selector.header = type agent.sources.tailsrc.selector.mapping.1 = memi agent.sources.tailsrc.selector.mapping.2 = mem2

agent.sinks.std1.type = hdfs

agent.sinks.stdl.channel = mem1

agent.sinks.stdl.batchSize = 1

agent.sinks.std1.hdfs.path = /user/hive/warehouse/flumemaleemployeei

agent.sinks.stdl.rolllnterval = 0

agent.sinks.stdl.hdfs.tileType = Data Stream

agent.sinks.std2.type = hdfs

agent.sinks.std2.channel = mem2

agent.sinks.std2.batchSize = 1

agent.sinks.std2.hdfs.path = /user/hi ve/warehouse/fIumefemaleemployee1

agent.sinks.std2.rolllnterval = 0

agent.sinks.std2.hdfs.tileType = Data Stream

agent.channels.mem1.type = memory agent.channels.meml.capacity = 100

agent.channels.mem2.type = memory agent.channels.mem2.capacity = 100

agent.sources.tailsrc.channels = mem1 mem2

Step 4 : Run below command which will use this configuration file and append data in hdfs.

Start flume service:

flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/flume5.conf --name agent

Step 5 : Open another terminal create a file at /home/cloudera/flumetest/in.txt.

Step 6 : Enter below data in file and save it.

l.alok.mumbai

1 jatin.chennai

1,yogesh,kolkata

2,ragini,delhi

2,jyotsana,pune

1,valmiki,banglore

Step 7 : Open hue and check the data is available in hive table or not.

Step 8 : Stop flume service by pressing ctrl+c

Question # 8

Problem Scenario 83 : In Continuation of previous question, please accomplish following activities.

1. Select all the records with quantity >= 5000 and name starts with 'Pen'

2. Select all the records with quantity >= 5000, price is less than 1.24 and name starts with 'Pen'

3. Select all the records witch does not have quantity >= 5000 and name does not starts with 'Pen'

4. Select all the products which name is 'Pen Red', 'Pen Black'

5. Select all the products which has price BETWEEN 1.0 AND 2.0 AND quantity BETWEEN 1000 AND 2000.

Full Access

Question # 9

Problem Scenario 1:

You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.categories

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

1. Connect MySQL DB and check the content of the tables.

2. Copy "retaildb.categories" table to hdfs, without specifying directory name.

3. Copy "retaildb.categories" table to hdfs, in a directory name "categories_target".

4. Copy "retaildb.categories" table to hdfs, in a warehouse directory name "categories_warehouse".

Full Access

Question # 10

Problem Scenario 78 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.orders

table=retail_db.order_items

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Columns of order table : (orderid , order_date , order_customer_id, order_status)

Columns of ordeMtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_product_price)

Please accomplish following activities.

1. Copy "retail_db.orders" and "retail_db.order_items" table to hdfs in respective directory p92_orders and p92_order_items .

2. Join these data using order_id in Spark and Python

3. Calculate total revenue perday and per customer

4. Calculate maximum revenue customer

Full Access

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation:

Solution :

Step 1 : Import Single table .

sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba -password=cloudera -table=orders --target-dir=p92_orders –m 1

sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba -password=cloudera -table=order_items --target-dir=p92_order_orderitems --m 1

Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs

Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p92_orders/part-m-00000 hadoop fs -cat p92 orderitems/part-m-00000

Step 3 : Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile(Mp92_orders") orderitems = sc.textFile("p92_order_items")

Step 4 : Convert RDD into key value as (orderjd as a key and rest of the values as a value)

#First value is orderjd

orders Key Value = orders.map(lambda line: (int(line.split(",")[0]), line))

#Second value as an Orderjd

orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(",")[1]), line))

Step 5 : Join both the RDD using orderjd

joinedData = orderltemsKeyValue.join(ordersKeyValue)

#print the joined data

for line in joinedData.collect():

print(line)

#Format of joinedData as below.

#[Orderld, 'All columns from orderltemsKeyValue', 'All columns from ordersKeyValue']

ordersPerDatePerCustomer = joinedData.map(lambda line: ((line[1][1].split(",")[1], line[1][1].split(",M)[2]), float(line[1][0].split(",")[4]))) amountCollectedPerDayPerCustomer = ordersPerDatePerCustomer.reduceByKey(lambda runningSum, amount: runningSum + amount}

#(Out record format will be ((date,customer_id), totalAmount} for line in amountCollectedPerDayPerCustomer.collect(): print(line)

#now change the format of record as (date,(customer_id,total_amount))

revenuePerDatePerCustomerRDD = amountCollectedPerDayPerCustomer.map(lambda threeElementTuple: (threeElementTuple[0][0], (threeElementTuple[0][1],threeElementTuple[1])))

for line in revenuePerDatePerCustomerRDD.collect():

print(line)

#Calculate maximum amount collected by a customer for each day

perDateMaxAmountCollectedByCustomer = revenuePerDatePerCustomerRDD.reduceByKey(lambda runningAmountTuple, newAmountTuple: (runningAmountTuple if runningAmountTuple[1] >= newAmountTuple[1] else newAmountTuple})

for line in perDateMaxAmountCollectedByCustomer\sortByKey().collect(): print(line)

Question # 11

Problem Scenario 51 : You have been given below code snippet.

val a = sc.parallelize(List(1, 2,1, 3), 1)

val b = a.map((_, "b"))

val c = a.map((_, "c"))

Operation_xyz

Write a correct code snippet for Operationxyz which will produce below output.

Output:

Array[(lnt, (lterable[String], lterable[String]))] = Array(

(2,(ArrayBuffer(b),ArrayBuffer(c))),

(3,(ArrayBuffer(b),ArrayBuffer(c))),

(1,(ArrayBuffer(b, b),ArrayBuffer(c, c)))

)

Full Access

Question # 12

Problem Scenario 14 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

1. Create a csv file named updated_departments.csv with the following contents in local file system.

updated_departments.csv

2,fitness

3,footwear

12,fathematics

13,fcience

14,engineering

1000,management

2. Upload this csv file to hdfs filesystem,

3. Now export this data from hdfs to mysql retaildb.departments table. During upload make sure existing department will just updated and new departments needs to be inserted.

4. Now update updated_departments.csv file with below content.

2,Fitness

3,Footwear

12,Fathematics

13,Science

14,Engineering

1000,Management

2000,Quality Check

5. Now upload this file to hdfs.

6. Now export this data from hdfs to mysql retail_db.departments table. During upload make sure existing department will just updated and no new departments needs to be inserted.

Full Access

Question # 13

Problem Scenario 87 : You have been given below three files

product.csv (Create this file in hdfs)

productID,productCode,name,quantity,price,supplierid

1001,PEN,Pen Red,5000,1.23,501

1002,PEN,Pen Blue,8000,1.25,501

1003,PEN,Pen Black,2000,1.25,501

1004,PEC,Pencil 2B,10000,0.48,502

1005,PEC,Pencil 2H,8000,0.49,502

1006,PEC,Pencil HB,0,9999.99,502

2001,PEC,Pencil 3B,500,0.52,501

2002,PEC,Pencil 4B,200,0.62,501

2003,PEC,Pencil 5B,100,0.73,501

2004,PEC,Pencil 6B,500,0.47,502

supplier.csv

supplierid,name,phone

501,ABC Traders,88881111

502,XYZ Company,88882222

503,QQ Corp,88883333

products_suppliers.csv

productID,supplierID

2001,501

2002,501

2003,501

2004,502

2001,503

Now accomplish all the queries given in solution.

Select product, its price , its supplier name where product price is less than 0.6 using SparkSQL

Full Access

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation:

Solution :

Step 1:

hdfs dfs -mkdir sparksql2

hdfs dfs -put product.csv sparksq!2/

hdfs dfs -put supplier.csv sparksql2/

hdfs dfs -put products_suppliers.csv sparksql2/

Step 2 : Now in spark shell

// this Is used to Implicitly convert an RDD to a DataFrame.

import sqlContext.impIicits._

// Import Spark SQL data types and Row.

import org.apache.spark.sql._

// load the data into a new RDD

val products = sc.textFile("sparksql2/product.csv")

val supplier = sc.textFileC'sparksq^supplier.csv")

val prdsup = sc.textFile("sparksql2/products_suppliers.csv"}

// Return the first element in this RDD

products.fi rst()

supplier.first{).

prdsup.first()

//define the schema using a case class

case class Product(productid: Integer, code: String, name: String, quantity:lnteger, price: Float, supplierid:lnteger)

case class Suplier(supplierid: Integer, name: String, phone: String)

case class PRDSUP(productid: Integer.supplierid: Integer)

// create an RDD of Product objects

val prdRDD = products.map(_.split('\")).map(p => Product(p(0).tolnt,p(1),p(2),p(3).tolnt,p(4).toFloat,p(5).toint))

val supRDD = supplier.map(_.split(",")).map(p => Suplier(p(0).tolnt,p(1),p(2)))

val prdsupRDD = prdsup.map(_.split(",")).map(p => PRDSUP(p(0).tolnt,p(1}.tolnt}}

prdRDD.first()

prdRDD.count()

supRDD.first() supRDD.count()

prdsupRDD.first() prdsupRDD.count(}

// change RDD of Product objects to a DataFrame

val prdDF = prdRDD.toDF()

val supDF = supRDD.toDF()

val prdsupDF = prdsupRDD.toDF()

// register the DataFrame as a temp table prdDF.registerTempTablef'products")

supDF.registerTempTablef'suppliers")

prdsupDF.registerTempTablef'productssuppliers"}

//Select product, its price , its supplier name where product price is less than 0.6

val results = sqlContext.sql(......SELECT products.name, price, suppliers.name as sup_name FROM products JOIN suppliers ON products.supplierlD= suppliers.supplierlD WHERE price < 0.6......]

results. show()

Question # 14

Problem Scenario 4: You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.categories

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

Import Single table categories (Subset data} to hive managed table , where category_id between 1 and 22

Full Access

Special Summer Sale - Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70dumps

DumpsTool Header

dumpstool logo

CCA175 Questions and Answers

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Quick Links

Why Us

Updated Exams

Site Secure

Footer