NPTEL Big Data Computing Assignment 3 Answers 2022:- In This article, we have provided the answers to NPTEL Big Data Computing Assignment 3. You must submit your assignment with your own knowledge.
About Big Data Computing
In today’s fast-paced digital world, the incredible amount of data being generated every minute has grown tremendously from sensors used to gather climate information, posts on social media sites, digital pictures and videos, purchase transaction records, and GPS signals from cell phones to name a few. This amount of large data with different velocities and varieties is called big data. Its analytics enables professionals to convert extensive data through statistical and quantitative analysis into powerful insights that can drive efficient decisions. This course provides an in-depth understanding of terminologies and the core concepts behind big data problems, applications, systems and the techniques, that underlie today’s big data computing technologies. It provides an introduction to some of the most common frameworks such as Apache Spark, Hadoop, MapReduce, Large scale data storage technologies such as in-memory key/value storage systems, NoSQL distributed databases, Apache Cassandra, HBase and Big Data Streaming Platforms such as Apache Spark Streaming, Apache Kafka Streams that has made big data analysis easier and more accessible. And while discussing the concepts and techniques, we will also look at various applications of Big Data Analytics using Machine Learning, Deep Learning, Graph Processing and many others. The course is suitable for all UG/PG students and practising engineers/ scientists from diverse fields interested in learning about the novel cutting-edge techniques and applications of Big Data Computing.
CRITERIA TO GET A CERTIFICATE
Average assignment score = 25% of the average of the best 6 assignments out of the total 8 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100
Final score = Average assignment score + Exam score
YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND THE EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.
You can find Big Data Computing Assignment 3 Answers 2022 below.
NPTEL Big Data Computing Assignment 3 Answers 2022
1. Consider the following statements in the context of Spark:
Statement 1: Spark improves efficiency through in-memory computing primitives and general computation graphs.
Statement 2: Spark improves usability through high-level APIs in Java, Scala, Python and also provides an interactive shell.
a. Only statement 1 is true
b. Only statement 2 is true
c. Both statements are true
d. Both statements are false
Answer:- c
2. True or False?
Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable.
a. True
b. False
Answer:- a
Answers will be Uploaded Shortly and it will be Notified on Telegram, So JOIN NOW
3. In Spark, a ______________________is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.
a. Spark Streaming
b. FlatMap
c. Resilient Distributed Dataset (RDD)
d. Driver
Answer:- c
4. Given the following definition about the join transformation in Apache Spark:
Answer:- b
5. True or False ?
Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk.
a. True
b. False
Answer:- a
6. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.
a. MLlib
b. GraphX
c. RDDs
d. Spark Streaming
Answer:- d
👇For Week 04 Assignment Answers👇
7. ____________________ is a distributed graph processing framework on top of Spark.
a. GraphX
b. MLlib
c. Spark streaming
d. All of the mentioned
Answer:- a
8. Which of the following are the simplest NoSQL databases ?
a. Wide-column
b. Key-value
c. Document
d. All of the mentioned
Answer:- b
9. Consider the following statements:
Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines.
Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf).
a. Only statement 1 is true
b. Only statement 2 is true
c. Both statements are false
d. Both statements are true
Answer:- For Answer Click Here
10. Point out the incorrect statement in the context of Cassandra:
a. It is originally designed at Facebook
b. It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure
c. It is a centralized key-value store
d. It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing
Answer:- c
For More NPTEL Answers:- CLICK HERE
Join Our Telegram:- CLICK HERE
NPTEL Big Data Computing Assignment 3 Answers 2021
Q1. In Spark, a ______________________is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.
(A) Spark Streaming
(B) FlatMap
(C) Driver
(D) Resilient Distributed Dataset (RDD)
Ans:- (D) Resilient Distributed Dataset (RDD)
Q2. Given the following definition about the join transformation in Apache Spark:
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.
Output the result of joinrdd, when the following code is run.
val rdd1 = sc.parallelize(Seq((“m”,55),(“m”,56),(“e”,57),(“e”,58),(“s”,59),(“s”,54)))
val rdd2 = sc.parallelize(Seq((“m”,60),(“m”,65),(“s”,61),(“s”,62),(“h”,63),(“h”,64)))
val joinrdd = rdd1.join(rdd2)
joinrdd.collect
(A) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
(B) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
(C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
(D) None of the mentioned
Ans:- (C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
ALSO READ : NPTEL – Online Courses,Certificate And full details
Q3. Consider the following statements in the context of Spark:
Statement 1: Spark improves efficiency through in-memory computing primitives and general computation graphs.
Statement 2: Spark improves usability through high-level APIs in Java, Scala, Python and also provides an interactive shell.
(A) Only statement 1 is true
(B) Only statement 2 is true
(C) Both statements are true
(D) Both statements are false
Ans:- (C) Both statements are true
Q4. True or False?
Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable.
(A) True
(B) False
Ans:- (A) True
NPTEL ALL WEEK ASSIGNMENT ANSWERS:-
- Soft Skill Assignment Answers
- Project Management For Managers Answer
- Semiconducter Devices And Circuit Answer
- Problem Solving Through Programming In C Answer
Q5. Which of the following is not a NoSQL database ?
(A) HBase
(B) Cassandra
(C) SQL Server
(D) None of the mentioned
Ans:- (C) SQL Server
Q6. True or False ?
Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk.
(A) True
(B) False
Ans:- (A) True
Q7. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.
(A) MLlib
(B) Spark Streaming
(C) GraphX
(D) RDDs
Ans:- For Answer Click Here
Q8. ____________________ is a distributed graph processing framework on top of Spark.
(A) MLlib
(B) Spark streaming
(C) GraphX
(D) All of the mentioned
Ans:- (C) GraphX
NOTE:- ANSWERS WILL BE UPDATED SHORTLY AND IT WILL BE NOTIFIED ON TELEGRAM, SO JOIN NOW
Q9. Point out the incorrect statement in the context of Cassandra:
(A) It is a centralized key-value store
(B) It is originally designed at Facebook
(C) It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure
(D) It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing
Ans:- (A) It is a centralized key-value store
Q10. Consider the following statements:
ALSO READ : NPTEL – Online Courses,Certificate And full details
Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines.
Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf).
(A) Only statement 1 is true
(B) Only statement 2 is true
(C) Both statements are false
(D) Both statements are true
Ans:- For Answer Click Here
Big Data Computing Assignment 3 Answers 2021:- We do not claim 100% surety of answers, these answers are based on our sole knowledge, and by posting these answers we are just trying to help students, so we urge do your assignment on your own.
Big Data Computing Assignment 2 Answers 2021