Big Data Analytics Tutorial | Big Data Analytics for Beginners | Intellipaat

Big Data Analytics Tutorial | Big Data Analytics for Beginners | Intellipaat


hey everyone welcome to the session by
Intellipaat driving a data-driven business to success using tons and tons
of data we have in today’s world is a challenge and big data analytics aims to
do just this IT companies ranging from the startups to the fortune 500 like
Apple Google Facebook and so many more require big data analytics and in
today’s session we are gonna check out just that well before we begin with the
session make sure to subscribe to the Intellipaat’s YouTube channel to never miss
an update from us and here’s the agenda for this session will quickly take an
introduction to big data analytics and check out why big data analytics is
required followed by this we’ll check out all the tools that are involved in
the world of big data analytics and the domains in which it’s spread around and
followed by this we’re gonna check out the use cases of big data analytics and
guys if you have any queries make sure to put them down in the comment section
below and we’d love to help you out and in case if you guys are interested in an
end-to-end course certification Intellipaat provides the big data hadoop
certification training program where you can master concepts and earn a
certificate in the same so without further ado let’s begin with the class
so coming to the first point on the agenda introduction to big data
analytics well so what does Big Data to begin with the most simple answer that I
can think of is that big data are extremely large data sets that may be
analyzed computationally and why is this done well this is done basically to
reveal all the hidden patterns hidden trends and associations of this
especially you know relating to the human behavior and interactions so at
the end of it when we talk about data when we talk about big data as literally
the name suggests it’s a lot of data and we need to filter through a lot of data
to make sense of it is the most simple answer that we can come up with for
what is big data then what is big data analytics well as I just mentioned it is
it is basically the process of examining large and varied data sets and this is
basically done to uncover informations you know again hidden patterns market
trends for example customer preferences all of this can be taken apart and you
know from the data which is like very huge so why should this be done then
well this is actually done to make sure that we help all the organizations take
very good informed decisions because at the end of the day
they’re seeing terabytes worth of data if they cannot make sense out of it then
it does not the right way to go so oh one quick thing that you guys need to
know on this point of time is the difference between analysis and
analytics well analysis in it’s more simple terms is
basically making use of past data and deriving some results from the data
which is already present and then analytics especially is something
different well we’ll be using the data to obtain future insights future trends
future details and so many things are with respect to the data we have now so
this is a basic difference between analysis and analytics guys so coming to
know more about big data again the first thing you need to know is this huge
amount of data well we are all aware of all the expenses of you know storing and
handling huge amounts of data right you know storing data is extremely expensive
and handling data is as expensive as well and then coming to the uniqueness
of the data well there’s so many type of data if you think about it the
structured data semi structured data unstructured data and then structured
data can be you know something in a table or an Excel format unstructured
data can be something basically which goes on about you know videos images and
text which probably doesn’t make sense as soon as you look at it and so much
more well and all of these data come from a variety of sources as well it
might be Twitter’s it might be Facebook and it might be data from across the
world across the globe guys and then the third thing is that know you need to
know that accessing and processing speed is another very big concern when it
comes to big data because let’s say if you have 100 mbps and put output channel
where you know you can process and then you need to like process 2 terabytes
worth of data so 100 mbps is what your network is capable of doing and then you
need to go about processing 2 terabytes of data and then basically this entire
process of just for small a small amount of data such as 2 terabytes of data will
take you about 6 hours yes it’s gonna take 6 hours for just 2
terabytes of data but then now think about this let’s consider the example of
Facebook Facebook has 2.3 billion monthly active users and Counting and
this number is always rising and this this pretty much refers to 250 billion
photos uploaded to first Facebook ever since they came out and then
secrets to 350 million photos uploaded per day so what did the data that
Facebook generate of the people at Facebook generate 4 petabytes of data
well you can check out what a petabyte is it’s one followed by six zeros and
this is in gigabytes so this is how much one petabyte is and facebook is
generating 4 petabytes every single day so think about imagining how you can go
about handling all of this data guys and this brings us to the next case which is
again Instagram and I’m sure most of us use Instagram out here well there are 40
billion photos which are shared 95 million photos every single day 500
million videos per day and this brings us to petabytes worth of data used in
more around per day so what’s a Peta Bite Again is pretty much checked out
a petabyte is just the quick info guys if you guys are planning to get
certified then Intellipaat provides the Big Data Hadoop training program
where you can master the concepts thoroughly and earn a certificate in the
same more details are given in the description box
well now let’s get back to the session so again as we check that a petabyte has
one followed by six zeros in gigabytes guys so this is a lot of data for a
social media platform as well and then coming to Twitter every single second
there are six thousand tweets this is pretty much you know equates up to 500
million tweets a day and this takes a huge amount of skill to handle huge
amounts of data basically and then they see terabytes worth of data every single
day as well so again one terabyte is 1000 gigabytes and then think about all
the complexity that a simple tweet does and how Twitter goes on about you know
analyzing it storing it and maintaining the data as well and this should bring
us to why big data analytics is actually needed and I can give you five very
important reasons of why handling big data is extremely important the first
thing is revenue generation as you’ve already talked about data are storing
data and processing data can be extremely expensive and if we can go
about saving money there then it is pretty much as good as I know generating
very good revenue for the company in terms of cost cuttings and saving as
well and then effective marketing well effective marketing in the term
of making sure that your data is very structured a lot of people get to know
about it and at the end of the day that the data you handle process should be easy
enough such that it can be marketable and understood by your customers is in
again again a very big thing about big data analytics and then providing very
good customer service because you have lots of data where you can perform lots
of analytics on the data and then give you a customer a very customized
experience and this again is another important aspect of this concept and
then operational efficiency again people say time is money I think in the world
of computer science its data is money because at the end of the day the better
your data is for a data-driven company the more efficient you become well this
pretty much refers to operational efficiency and then it comes to
competitive advantages again this point can be elaborated beyond measure but to
give you the most simple aspect of what competitive advantage is again when you
can think about all the data being handled by the social media companies
that I just told you about Facebook or Twitter Instagram and so much more then
think about how you can have an edge over the others by understanding data
better performing better analytics on the data and taking over that number one
spot or the podium spot I know in the top fight of the companies guys so that
provides again a very good edge in terms of competitive advantage so elaborating
on this again in terms of revenue generations again when you can you know
think about technology such as Hadoop cloud based analytics and all of these
basically bring very good cost advantages because it helps in storing
large amounts of data and it does so by providing efficient ways of eventually
showing the data performing analytics and doing business guys well if you do
not know what Hadoop is I’ll be telling about it in the next couple of slides so
make sure you stick around for that and then the next point is very important
decision making again with the speed of Hadoop and in-memory analytics combined
with the ability to analyze new sources of data you know all these businesses
today they’re able to analyze information as soon as they get it and
this basically again saves time helps them make a better decision basically
using all the things that they’ve actually learnt and this leads to new products
as well well you can know what your customer wants at this point of time and
you can provide the customized experiences I just told you using the
power of analytics guys so again you know basically with respect to big data
analytics more companies are creating new products to meet customer needs
all the single time every single day in any of the data driven businesses guys
so this brings us to the tools which are basically essential in the world of big
data analytics again as I’ve told you we have Hadoop we have hive we have storm
saundra SPARK MongoDB and various other tools and these are the big guys and the
big guns as they would say in the field of Big Data guys so coming to Hadoop
Hadoop is one of the very well-known big data frameworks so it’s basically from
the Apache Software Foundation well what it basically does is it allows
distributed processing of huge amounts of data set across a cluster of
computers guys so this basically tells you that you can split your data instead
of storing it all in one machine and you can spread it across n number of
machines that you want over a network or a cluster as the exact term and then you
can use it later guys so this this makes up the distributed processing part of it
and it is designed to scale up from one server to thousands of machines as well
so if you’re using Hadoop you can go about using it on one machine as well as
scale up to thousands and more as well guest and this brings us to storm storm
is again another raw free and an open source of big data computation system so
storm is basically very efficient at handling a lot of data and helping
processing data as well but where storm has a strong footing is that it it can
provide you very good analytics in real time and it has
amazing fault tolerance capabilities ah you know with real-time computation
capabilities at the end of the day having these both are very essential
because it takes a lot of power it takes a lot of time and energy to build a
system which will give you I know real-time processing because are we
talking about data in terms of terabytes petabytes and so much more so if you can
pick up something from that in real time and give it out then it is amazing right
so next up we have Cassandra well Apache Cassandra basically is a data based tool
it is actually very widely used to provide you know
your effective management of again large amounts of data as we have been
stressing for a while now and then this are again supports scaling well how does
it do that well it’s a put scaling by performing
replication of data across multiple data centers and this is basically done for
scalability guys and then extremely good fault tolerance and very low latency so
it might not be near real-time but it can be very good low latency in terms of
working with Cassandra and then hi was another very famous tool which is widely
used as well hive is basically another open source software open source big
data tool it allows programmers to analyze large data sets which are
present on Hadoop and then it helps with querying managing large data sets hive
talks to all our database tools as well as Hadoop and we can consider hive to be
the middleman when we’re going about analyzing and querying data sets guys so
this brings us to SPARK just a quick info guys if you guys are planning to
get certified then Intellipaat provides the Big Data Hadoop training
program where you can master the concepts thoroughly and earn a
certificate in the same more details are given in the description box
well now let’s get back to the session I am sure many of us here have heard of
SPARK right well you haven’t this is what it does it is basically another
open source big data tool which is basically which came into picture to
fill the gaps of Apache Hadoop when it came to data processing guys so there
are some advantages of SPARK when it directly related to Hadoop and then
SPARC was the clear winner out of it but then a spark has its own use cases and
Hadoop as its own so what this Spark do well spark at the same time can handle
both you know batch data processing where data is splitted into batches and
then you can interpret and analyze and process each batches and then it
supports real time data processing as well and then a spark does something
which is amazing what we call as in-memory data processing
so it basically stores everything in memory and does not have to talk to a
hard disk every single time so what this does basically is it exponentially
speeds up the data processing guys so this is one thing I really like about
spark which is the in-memory data processing feature it provides to us as
users and then Coming to MongoDB MongoDB is again a very
famous tool well MongoDB is basically another open
source no SQL database and it has cross-platform compatible as well so
cross-platform compatibility is extremely vital in the terms of big data
as we have lots of tools apart from all the ones that I’ve mentioned as well so
making sure that you can make all of your tools or talk to all the operating
systems talk to all the clusters all the nodes it is extremely vital and then
MongoDB has a plethora of features as well guys so who should use MongoDB well
MongoDB is ideal for businesses that need really fast and real-time data
analytics again and this is basically required by businesses who have to take
instant decisions at that point of time case and it is also ideal for the users
who want a very good data driven experience from the excess or a lot of
data that they or their business has so this brings us to the domains in which
big data analytics as it’s foot deep guys and the first big arena and my
opinion out of the six arenas we’ll be discussing is in the world of
Life Sciences well we already know that you know clinical research and in the
field of medicine sometimes our research is very slow and it can be an extremely
expensive process as well but with respect to big data analytics we can
bring in a lot of advanced analytics artificial intelligence and we have
something called as IOMT as well as basically Internet of medical things and
all of these are along with all the data you know unlocks the potential of
improvising speed and efficient to such a vast level that it is actually
providing exponential results so how is this done well this is done by pretty
much you know delivering at the end of the day more intelligent automated
solutions by making use of data and just by knowing how we can handle lots of
data at the same time guys so the second arena we’re handling big
data and essential is in the world of banking because when you come to think
about it or banks have a lot of data about a lot of its users lot of
customers and so much more so when they when they gather and you know when they
access all of these analytical insights that you can provide from large volumes
of data and this at the end of the day if it helps them make a you know a very
good financial based on that well let’s say the the the
bank you know expects customers to walk in at 9:30 hence they open the doors at
9:30 but then they don’t realize that they’re losing customers who might want
to come at 9:00 a.m. as well so with with the data that you can have
unstructured data like this especially or they might drive so much more
business if they were open at 9:00 a.m. which was just half an hour before so
that was a very simple example but then when you talk about security in the
world of banking this is where analytics is required as well guys because you
know this will give them access to all the information that they think are they
need at that point of time as well and this is done by again will be
eliminating a lot of redundant tools in the world of banking lot of redundant
systems will be gone and then we will reduce overlapping as well guys and this
brings us to the third arena the third arena is in the field of manufacturing
because when you come to think about it you know manufacturers deal with
everything from complex supply chains motion applications labor constraints
and equipment breakdowns all at the same time as well
and this makes Big Data the most essential thing in the field of
manufacturing industry because when you’re talking to multiple teams when
you’re talking to multiple requirements at the same time a lot of data can bring
in a lot of clarity there as well guys because again when you come to think
about it it has allowed a lot of competitive organizations to basically
you know discover all of these new cost-saving opportunities and methods
and revenue generation techniques which was pretty much not there before just by
making use of big data analytics guys so again the next arena is in healthcare
well again when you can think about it in terms of healthcare think about the
data which is generated because again there are lots and lots of patient data
or health plans insurance informations and all of these at the end of the day
can eventually be difficult to manage for hospitals or any other healthcare
industries as well because you know having a very key insight on all of
these after your analytics has been applied will make it so much simpler and
this is why again that you know big data analytics technology is extremely
important or in the world of healthcare as well by you know analyzing this huge
amounts of information or with it with terms of any amount of data guys when
you can think about it can be something very small or very huge
but then the data here is where the force is that the data can be structured
or unstructured as well and if you can do all of these quickly then you don’t
healthcare providers can you know pretty much go about providing diagnosis which
can be life-saving or treatment options like at the instant as soon as it’s
required and this again is this and and this again makes the field of health
care extremely efficient and exponentially faster as well and this
brings us to the next arena as governance case again when you think
about the challenges that you know most of the governments face it’s basically
to tighten down on a budget without you know compromising on any sort of
quality or productivity for its nation and this sometimes can get pretty
troublesome because there’s a lot of law enforcement agencies and you know again
they’re struggling to you know bring great crime rates down because they do
not have any resources to basically begin working with but with the help of
big data analytics you can basically streamline all the operations while
giving the agency or more on a holistic view or more overview good overview and
in-depth view of all the criminal activity and then they can focus on the
things that’s required with the amount of resources they have instead of just
you know spending the resources on everything that at the end or end of the
day might not be helpful and this brings us to the next sector which is in retail
well ah the people who are selling the products what we call as the retailers
they know what what exactly the customer wants and the what exactly the customer
needs well and if they do not know well the shoppers know it for short but if
you’re a retailer who’s armed with extreme huge amounts of data from their
raw you know programs like customer loyalty programs analyzing customer buys
habits and lot of other sources they not only have basically an in-depth
understanding of the customers but they can use all of these data to predict the
trends recommend new products and boost profitability which is the most
important point the most important driving aspect of a retail business cguys
and then this quickly brings us to the use cases of big data analytics in
today’s world well this is one of the amazing use cases that I pretty much
could think of well think about the big data architecture for well think about
this the next time you go on Amazon Prime or say
Netflix or any other streaming program well video recommendation engines are
something which works so subtly you know they are so subtly integrated that we do
not even feel that the recommendation engine works but then sometimes we are
taken aback because the recommendation system is so good so we’ll just be
thinking of you know let’s say hey I wanna watch a comedy movie today but
then at the same time or let’s say you’re browsing a couple of comedy
movies and then are using big data architecture these very good video
recommendation engine then gives you amazing recommendations for for a next
comedy movie which you thought you wanted to watch just now so that is
something amusing and you’d be like wow this is exactly what I was thinking and
my computer to present this to me so now basically in terms of big data analytics
we can go about achieving very good results in terms of video recommendation
engines as well so you know we all have set up boxes we all have we all have
subscriptions to all these very good streaming websites and here’s a big data
works there basically what we’ll be using as you know we’ll be using are an
open source Hadoop architecture as a big data foundation and this is done to
collect a very raw user data from all of these on-demand video setup box activity
logs schedule recordings media catalogs your past recommendations your past
viewings and so much more and all of this is taken all of this is processed
and analyzed are using the Hadoop framework guys and at the end of it
basically we feed all these results into a search engine which then delivers what
we call as these unique recommendations are to a user who’s actually using the
browser interface and this gives you a tailor-made experience in terms of video
recommendation engines and this is the best part about a big data in my
opinion guys and you can quickly check out the architecture of how this how
this goes about on the right hand side of your screen as well so you have all
of the data which gets into Apache’s Apache’s hadoop and then after hadoop it
you know in a separated into memcache and search engines and then you have
web applications and the API is at the end of the day what the user sees is the web
browser mobile web applications or so many things which is basically used to
share the data use the analysis and the backend and give you a customer the
tailor experience guys so this brings us to the next amazing use case of Big Data
basically which is used to high candidates as well you’ll know if
there’s a job opening there will be a lot of people applying so if the company
is hiring for two people I’m not sure they just are taking two people for the
interview process there can be people ranging from 20 to 200 to 2,000 or more
are applying for a job well if you can automate this process of you know making
the job of a hiring manager easier then you can go about using big data
analytics for the same guys well will be basically looking beyond all the normal
keywords you would see in a resume a and we would go in-depth into what we call a
semantic analysis and we’ll be looking towards sports structured and
unstructured content as well ah you know you can pick up exact or details exact
metrics such as all the job titles the person has I had experience
certifications industry he or she has worked in the companies all the skills
he or she possesses and so much more and all of these you can pretty much take
and directly compare it to a job description without the help of a human
and that’s the most important thing guys and then you can go about further
evaluating if the person is fit for the job in terms of the recruiters
perspective you know including all of the hiring data up preferences salaries
are in organizational metrics and so much more so pretty much you can go
about filtering candidates by just by making use of big data analytics guys so
you can take out all the candidates metadata candidates files or job
metadata job file and you can use the process of ingestion to basically bring
all of these into your architecture into the big data framework and then make use
of all the processing searching and matching engines and then at the end of
it give a give very good ah you know business application service and at the
end of the day this can be uh sent to a web browser where the person might see
the candidate or the person who’s hiring as well and then this brings us to the
next use case which is basically detecting insurance fraud well we
already know that a lot of insurance frauds happen well you know it might not
be a genuine case where a person is trying to claim the insurance and
sometimes it is very essential for the insurance companies to determine if it
is a fraud attempt or if it’s a genuine attempt so this is done by basically you
know taking in all the information from the interview notes email conversation
social media sites and then they go about combining all of these data or
unstructured data basically with official structured
records and transactions and transcripts of how it should be done and at the end
of it we can compare the trends we can detect patterns in which the user thinks
so you know you can think about online shopping media of yours or so many
things at this and at this point of time to understand how the customer thinks
how the customer tends to do something and how we can go about detecting
patterns in the person’s behavior guys and what this basically does is it helps
us identify hidden relationships throughtout the persons or behavior and then we can
go about using this to perform very good data correlation network analysis and so
much more and this again helps in sourcing multiple content repositories
of public records which is basically used to find our rules and all of these
rules if they are being voided then what we call then it is what we call as a red
flag patterns it means that this person is committing a fraud and it is not a
genuine attempt are to claim the insurance case so basically we’ll be you
know we’ll be using all the claim files data your big data framework will will
be basically computing your fraud indicators and you’ll be having lot of
data warehousing techniques as well application api’s and at the end of it
again web browsers where basically we’ll be seeing the output as well of this
guy’s just a quick info guys if you guys are planning to get certified then
Intellipaat provides the Big Data Hadoop training program where you can
master the concepts thoroughly and earn a certificate in the same more details
are given in the description box this brings us to the end of this session I
hope you guys took away a lot of information from here if you have any
queries make sure to head down to the comments section and let us know we’d be
happy to help at the quickest on that note have a nice day


7 thoughts on “Big Data Analytics Tutorial | Big Data Analytics for Beginners | Intellipaat

  1. Guys, which technology you want to learn from Intellipaat? Comment down below and let us know so we can create in depth video tutorials for you.:)

  2. Guys everyday we upload in depth tutorial on your requested topic/technology so kindly SUBSCRIBE to our channel👉( http://bit.ly/Intellipaat ) & also share with your connections on social media to help them grow in their career.🙂

  3. Hello sir / mam
    I am currently giving my class 10 boards exams .
    I want to go into computer course probably ethical hacking but I don't want to become a script kiddie and I don't believe that schools and colleges teach you well .
    Can you recommend me cheap sources like books , online course , etc .
    Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *