Best Data Science Documentary FULL HD | Data Science vs Machine Learning, AI | English Subtitles

It sometimes seems we’re
being deluged with data. Wave upon wave of news and messages. Submerged by step counts. Constantly bailing out to make room
for more. We buy it, surf it, occasionally
drown in it and with modern technology quantify
ourselves and everything else
with it. Data is the new currency of
our time. Data has become almost a magic word
for…anything. Crime and lunacy and literacy and
religion and…drunkenness. You name it, somebody was gathering
information about it. It offers the ability to be
transformationally positive. It’s, in one sense, just the
reduction in uncertainty. So what exactly is data? How is it captured, stored, shared
and made sense of? The engineers of the data age are
people that most of us have never
heard of, despite the fact that they
brought about a technological and philosophical
revolution, and created a digital world that the
mind boggles to comprehend. This is the story of THE word of our
times… the constant flow of more and
better data has transformed
society… ..and is even changing our sense of
ourselves. I can’t believe this is my life now. So come on in, because the water’s
lovely. My name is Hannah Fry, I’m a mathematician, and I’d like to
begin with a confession. I haven’t always loved data. The truth is mathematicians just
don’t really like data that much. And for most of my professional life
I was quite happy sitting in a windowless room with my equations,
describing the world around me. You can capture the arc of a perfect
free kick or the beautiful aerodynamics of a race car. The mathematics of the real world is
clean and ordered and elegant, everything that data absolutely
isn’t. There was one moment that helped to
change my mind. It was in 2011 when I came across a
little game that a teenage Wikipedia user called Mark J had invented. Now, Mark noticed that if you hit
the first link in the main text of
any Wikipedia page and then do the same
for the next page, a pattern
emerges. So the page for data, for example, links from “set” to “maths” to “quantity” to “property” and then
“philosophy”, which after a few more links will
loop back onto itself. Now, the page “egg” ends up in the
same place, and even that famously philosophical
boyband One Direction will take you all the way through to “philosophy”, although you have to go through
“science” to get there. The same goes for “fungi”, or
“hairspray”, “marmalade”, even “mice”,
“dust” and “socks”. It was a very strange finding and it
called for some statistics. Another Wikipedia user, Il Mare, wrote a computer program to try
and investigate this phenomenon. Now, he discovered, amazingly, that
for almost 95% of Wikipedia pages, you will end up getting to
“philosophy” eventually. Now, that’s pretty cool, but how did
it change my mind about data? Well, the pattern that Mark J
discovered and the data that was
captured and analysed, it revealed a hidden
mathematical structure, because Wikipedia is just a network
with loops and chains hidden all
over the place and it’s something
that can be described beautifully
using mathematics. For me this was the perfect example
of how there are two parallel
universes. There’s the tangible, noisy, messy
one, the one that you can see and touch
and experience. But there’s also the mathematical
one, where I think the key to our understanding lies. And data is the bridge between those
two universes. Our understanding of everything from
cities to crime, global trade, migration and even disease…’s all underpinned by data. Take this for example. Rural Wiltshire and a dairy farm gathering data from its cows wearing pedometers. We can’t be out here 24-7. The pedometers help us to have our
eyes and ears everywhere. It turns out when cows go into heat they move around a lot more
than normal. Constant monitoring of their steps
and some background mathematics reveal the prime time for
insemination. We’ll be able to look at the data and within 24 hours there’ll be a greater chance of getting her in
calf. Data-driven farming is now big
business, turning a centuries-old way of life
into precision science. Pretty much every industry you can
think of now relies on data. We all agree that we are undergoing a major revolution in human history. The digital world replacing the
analogue world. A world based on data, they are made of codes rather than a
world made of biological or physical data, that is extraordinary. Why philosophy at this stage? Because when you face extraordinary
challenges, the worst thing you can do is to get
close to it. You need to take a long run-up. The bigger the gap, the longer the
run-up. And the run-up is called philosophy. In the spirit of taking a long
run-up, we’ll start with the word itself. “Data” is originally from the Latin
“datum”, meaning “that which is given”. Data can be descriptions… ..counts, or measures… ..of anything… any format. It’s anything that when analysed
becomes information, which in turn is the raw material
for knowledge, the only true path to wisdom. Look at the data on data. And before the scientific and
industrial revolution, the word barely gets a look in, in
English. But then it starts to appear in
print as scientists and the state gather, observe and create more and more
of it. This arrival of the age of data
would change everything. Industrial Revolution Britain. For Victorians, booming industry and the growth of
major cities were changing both the landscape and daily life beyond
recognition. Into this scene stepped an unlikely
man of numbers, William Farr, one of the first people to manage
data on an industrial scale. William Farr had a quite unusual
upbringing, in that he was actually the son of a
farm labourer but who had managed to get a medical education, which was
really very unusual for someone of
his class. Farr very quickly became absorbed in
the study of statistics. He was particularly interested, as
you might expect for somebody with
medical training, in public health, life expectancy
and about causes of death. For anyone interested in statistics, there was only one place to be. Somerset House in London was home to the General Register Office, where, in 1839, Farr found his dream job. From up there in the north wing,
William Farr, the apothecary, medical journalist and top
statistician, would really rule the roost. Now, this place was almost like a
factory. Here, they would collect,
process and analyse vast amounts of data. So in would come the census returns,
the records of every single birth, death and marriage in the country,
and out would come the big picture, the usable information that could
help inform policy and reform society. I think it’s sometimes difficult for
us to remember just how little
people knew in the early 19th century about
the changes that Britain was
going through. So when Farr did an analysis of
population density and death rate, he was able to show that life
expectancy in Liverpool was absolutely atrocious. It was far, far worse than the
surrounding areas. This came as a surprise to a lot of
people who believed that Liverpool, a coastal town, was actually quite a
salubrious place to live. At Somerset House, Farr spearheaded
a revolution in the systematic collection of data to uncover the real picture of this
changing society. Its scale and ambition was described
in a newspaper at the time. “In arched chambers of immense
strength and extent “are, in many volumes, the “genuine certificates of upwards of
28 million persons “born into life, married or passed
into the grave.” Here, every person was recorded
equally. A revolutionary idea. “Here are to be found the records of
nonentities, “side-by-side with those once
learned in the law “or distinguished in “literature, art or science.” But what really motivated William
Farr was not just data collection, it was the possibility that data
gathered could be analysed to help overcome society’s greatest ill. Cholera was probably the most feared
of all of the Victorian diseases. The terrifying thing was that you
could wake up in the morning and
feel absolutely fine, and then be dead by the evening. Between the 1830s and the 1860s, tens of thousands died in London
alone. The control of infectious diseases
like cholera, which no-one fully understood, became the greatest
public-health issue of the time. However great London might have
looked back then, it would have smelled absolutely
terrible. At that point, the Victorians didn’t
have really a great way of disposing of human waste, so it would have
flowed down the gutters into open sewers and out into the Thames. Now, the city smelt so bad that it was pretty plausible that the foul
air was responsible for carrying the
disease. Farr collected a huge range of data
during each cholera outbreak to try to identify what put people most at
risk from the bad air. He used income-tax data to try and
measure the affluence of the
different boroughs that were affected by
cholera. He asked his friends at the Royal
Observatory to provide data on the temperature and climatic conditions. But the one that he thought was most
convincing was about the topography. It was about the elevation above
the Thames. Using the data, Farr suggested a
mathematical law of elevation. Its equations described how cholera
mortality falls the higher you live above the Thames. Now, he published his report
in 1852, which the Lancet described as one of the most remarkable productions of type and pen in any age and country. The only problem was that Farr’s
work, although elegant and meticulous, was fundamentally flawed. Farr stuck to the prevailing theory that cholera was spread by air. Such is the power of the status quo. But in 1866, 5,500 people died in just one square
mile of London’s East End, and that data made Farr change his
mind. When Farr came to write his next
report, the data told a different story which proved the turning point in combating the disease. The common factor among those who
died was not elevation or air but sewage-contaminated drinking
water. With this new report, Farr may seem to have contradicted much of his own work, but I think
that this is the perfect example of what
data can do. It provides that bridge essential to
scientific discovery, from theory to proof, problem to
solution. Good data, even in huge volumes, does not guarantee that you will
arrive at the truth. But, eventually, when the weight of
the data tips the balance, even the strongest-held beliefs can
be overcome. Of course, it was the weight of the
data itself which, with the dawn of the 20th century, was becoming increasingly hard to
manage. Data stored long form in things like
census ledgers could take the best part of a decade to process, meaning the stats were often out of
date. When you’re dealing with figures
like these, it’s one thing. But when you’re counting the
population like this it’s quite a
different matter. A deceptively simple
solution got what’s now called the information revolution under way, encoding data as holes punched in
cards. These cards are passed over sorting
machines, each of which handles 22,000 cards a
minute. By the 1950s, data processing and
simple calculations were routinely mechanised, laying the groundwork
for the next generation of data-processing machines. They would be put to pioneering work in a rather unlikely place. In a grand London dining hall, a
group of men and women, many in their 80s and 90s, have
gathered for a special work reunion. At its peak, their employer,
J Lyons, purveyor of fine British tea and
cakes, had hundreds of tea shops
nationwide. There are hundreds of items of food. All these in a varying quantity each
day are delivered to a precise timetable to the tea shops. These people aren’t former J Lyons
bakers or tea-shop managers. They were hired for their
mathematical skills. Lyons had a huge amount of data
which has to be processed, often very low-value data. So, for example, the transaction
from a tea shop would be a cup of tea. But each one had a voucher and had
to be recorded, and had to go to the accounts for business reasons and for
management reasons. Every calculation you did,
not only you had to do it twice, but you had to get it checked by
someone else as well. The handling of these millions and
millions of pieces of data, the storage of that data, are the
key of the business problem. The Lyons team took the world by
surprise when, in 1951, they unveiled the Lyons Electronic
Office, or Leo for short. At this point, only a handful of
computers existed, and they were used solely for
scientific and military research, so a business computer was a radical
reimagining of what this brand-new technology could be for. Each manageress has a standing order
depending on the day of the week. She speaks by telephone to head
office, where her variations are taken quickly onto cards. What the girl hears, she punches. The programme is fed first, laying down the sequence for the
multiplicity of calculations Leo
will perform. It was the first opportunity to process large volumes of clerical work, take all the hard work out of it, and put it on an automatic system. Before Leo, working out an
employee’s pay took an experienced clerk
eight minutes, but with Leo that dropped to an
astonishing one and a half seconds. It was all so exciting because we were breaking new ground
the whole time. Absolutely everything which we did
has never been done before. By anybody anywhere. I don’t think we realised the kind
of transformation we were part of. The post-war years saw a boom in the
application of this new computing technology. Leo ran on paper, tape and cards, but soon machines with magnetic tape
and disks were developed, allowing for greater data storage
and faster calculations. As more businesses and institutions
adopted these new machines, application of mathematics to a
whole host of new, real-world challenges took off. And the word “data” went from
relatively obscure to ubiquitous. “Data” has become almost a magic
word for anything. The truth is that it is a kind of
interface today between us and the rest of the
world. In fact, between us and ourselves, we understand our bodies in terms of
data, we understand society in terms of
data, we understand the physics of the
universe in terms of data. The economy, social science, we play
with data, so essentially it is what we
interact with most regularly every day. Data underpins all human
communication, regardless of the format. And it was the desire to communicate
effectively and efficiently that led to one of the most important
academic papers of the 20th century. A mathematical theory of
communication has justifiably been called the
Magna Carta for the information age. It was written by a very young and
bright employee of Bell
Laboratories, the American centre for telecoms
research that was founded by one of the inventors of the telephone,
Alexander Graham Bell. Now, this paper was written by
Claude Shannon in 1948 and it would effectively lay out the theoretical
framework for the data revolution that was just beginning. Those that knew him described Shannon as a lifelong puzzle solver
and inventor. To define the correct path it
registers the information in its
memory. Later, I can put him down in any
part of the maze that he has already
explored and he will be able to go directly
to the goal without making a single false turn. During World War II he worked on
data-encryption systems, including one used by Churchill and
Roosevelt. But at Bell Labs, Claude Shannon was trying to solve
the very civilian problem of noisy telephone lines. # There’s a call, there’s a call # There’s a call for you # There’s a call on the phone for
you. # In that analogue world of
20th-century phones, your speech was converted into an
electrical signal using a handset like this and then transmitted down
a series of wires. The voice signals would travel along
the wire, be detected by the receiver at the
other end and then be converted back into sound waves to reach the ear
of whoever had picked up. The problem was, the further the electrical signal
travelled down the line, the weaker it would get. PHONE LINE CRACKLES
Eventually you couldn’t even hear the conversation for the
amount of noise on the line. And you could boost the signal but
it would mean boosting the noise,
too. Shannon’s genius idea was just as
simple as it was beautiful. The breakthrough was converting
speech into an incredibly simple code. ON PHONE: Hello? First the audio wave
is detected, then sampled. Each point is assigned a code of
ones and zeros and the resulting long string of
digits can then be sent down the wire with the zeros as brief
low-voltage signals and ones as brief bursts of
high voltage. From this code, the original audio
can be cleanly reconstructed and regenerated at the other end. ON PHONE: Hello? Shannon was the first person to
publish the name for these ones and zeros, the smallest possible pieces of
information, and they are called bits or
binary digits, and the real power of the bit
and the mathematics behind it applies way beyond telephones. They offered a new way for
everything, including text and pictures, to be
encoded as ones and zeros. The possibility to store and share
data digitally in the form of bits was clearly going to transform the
world. If anyone has to be identified as the genius who developed the foundational science of mathematics for our age, that is certainly
Claude Shannon. Now, one thing has to be clarified, the theory developed by Shannon is
about data transmission and it has nothing to do with meaning, truth,
relevance, importance of the data transmitted. So it doesn’t matter whether the
zero and one represent an answer to, “Heads or tails?”,
or to the question,
“Will you marry me?”, for a theory of information is data
anyway and if it is a 50-50 chance that you will or will not marry me
or that it is heads or tails, the amount of information,
the Shannon information, communicated is the same. Shannon information is not
information like you or I might
think about it. Encoding any and every signal using
just ones and zeros is a pretty remarkable breakthrough. However, Shannon also came up with a
revolutionary bit of mathematics. That equation there is the reason
you can fit an entire HD movie on a flimsy bit of plastic or the reason
why you can stream films online. I’ll admit, it might not look too
nice, but… don’t get put off yet, because I’m
going to explain how this equation works using Scrabble. Imagine that I created a new
alphabet containing only the letter A. This bag would only have A tiles
inside it and my chances of pulling out an A
tile would be one. You’d be completely certain of what
was going to happen. Using Shannon’s maths, the letter A contains zero bits of
what’s called Shannon information. Let’s say then I got a little bit
more creative, but not much, and had an alphabet with two
letters, A and B, and equal numbers
of both in this bag. Now my chances of pulling out an A
are going to be a half and each letter contains one bit
of Shannon information. Of course, when transmitting real
messages, you’ll use the full alphabet. But English, as with every other language,
has some letters that are used more frequently than others. If you take a quite common letter
like H, which appear about 5.9% of the
time, this will have a Shannon information
of 4.1 bits. And incidentally, a Scrabble score
of four. Of course, there are some much more
exotic and rare letters, like Z, for instance, which appears
about 0.07% of the time. That gives it 10.5 bits and Scrabble score of ten. Bits measure our uncertainty. If you’re guessing a three-letter
word and you know this letter is Z, it gives you a lot of information
about what the word could be. But if you know it’s H, because it is a more common letter
with less information, you’re more uncertain about the
answer. Now if you wrap up all that
uncertainty together, you end up with this,
the Shannon entropy. It’s the sum of the probability of each symbol turning up times the number of bits in each symbol. And this very clever bit of insight and mathematics means that the code for any message can be quantified. Not every letter, or any other
signal for that matter, needs to be encoded equally. The digital code behind a movie
like this one of my dog, Molly, for example, can usually be compressed by up to
50% without losing any information. But there’s a limit. Compressing more might make it
easier to share or download, but the quality can never be the
same as the original. DOG BARKS You can’t really overstate the
impact that Shannon’s work has had, because without it we
wouldn’t have JPEGs or Zip files or HD movies or digital
communications. But it doesn’t just stop there,
because while the mathematics of information theory doesn’t tell you
anything about the meaning of data, it does begin to open up a
possibility of how we can understand ourselves and our society, because pretty much
anything and everything can be measured and encoded as data. We say that signals flow through
human society, that people use signals to get
things done, that our social life is, in many
ways, the sending back and forth of
signals. So what is a signal? It’s, in one sense, just
the reduction in uncertainty. What it means to receive a signal is
to be less uncertain than you were before and so, another way to think
of measuring or quantifying signal is in that change in uncertainty. Using Shannon’s mathematics to
quantify signals is common in the world of
complexity science. It’s rather less familiar to
historians. I love maths, I love its precision,
I love its beauty. I absolutely love its certainty, and that, Simon can
bring that mathematical worldview, that mathematical certainty
to what I work with. The reason behind this remarkable
marriage between history and science is the analysis of the largest
single body of digital text ever collated about ordinary people. It’s the Proceedings of London’s
Old Bailey, the central criminal court of
England and Wales, which hosted close to 200,000 trials
between 1674 and 1913. There are 127 million words of
everyday speech in the mouths of orphans and women
and servants and ne’er-do-wells, of criminals, certainly, but also people from every rank and
station in society. And that made them unique. What’s exciting about the Old Bailey
and the size of the dataset, the length and magnitude of it, is that not only can we detect a
signal, but we are able to look at
that signal’s emergence over time. Shannon’s mathematics can be used to
capture the amount of information in every single word, and like the alphabet,
the less you expect a word, the more bits of information it
carries. Imagine that you walk into a
courtroom at the time and you
hear a single word, the question we ask is how much
information does that word carry about the nature of the crime being
tried? You hear the word “the”. It’s common across all trials and
so gives you no bits of information. Most words you hear are poor signals
of what’s going on. But then you hear “purse”. It conveys real information. Then comes “coin”, “grab” and “struck”. The more rare a word, the more bits
of information it carries, the stronger the signal becomes. One of the clearest signals that we
see in the Old Bailey, one of the clearest processes that
comes out, is something that is known as
the civilising process. It’s an increasing sensitivity to,
and attention to, the distinction between violent and
nonviolent crime. If, for example, somebody hit you
and stole your handkerchief, in the 18th-century
context, in 1780, you would concentrate on the
handkerchief. More worried about a few pence worth
of dirty linen than the fact that somebody just broke your nose or
cracked a rib. The fact that 100 years later,
by 1880, every concern, every focus, both
in terms of the words used in court, but also in terms of what people
were brought to court for, focus on that broken nose and
that cracked rib, speaks to a fundamental change in
how we think about the world and how we think about how social
relations work. Look at the strongest word signals
for violent crime across the period. In the 18th century, the age of
highwaymen, words relating to property theft
dominate. But by the 20th century, it’s physical violence itself
and the impact on the victim that carry the most weight. That notion that one can trace
change over time by looking at language and how it’s
used, who deploys it in what context, that I think gives this kind of work
its real power. There are billions of words,
there’s all of Google Books, there’s every printed newspaper, there is every speech made in
Parliament, every sermon given at most churches. All of it is suddenly data
and capable of being analysed. The rapid development of computers
in the mid 20th century transformed our ability to encode,
store and analyse data. It took a little longer for us to
work out how to share it. This place is home to one of the
most important UK scientific
institutions, although it’s one you’ve probably
never heard of before. But since the 1900s, this place has
advanced all areas of physics, radio communications, engineering,
materials science, aeronautics, even ship design. NPL, the National Physical
Laboratory, in south-west London is where the
first atomic clock was built and where radar and the Automatic
Computer Engine, or Ace, were
invented. The Ace computer was the brainchild
of Alan Turing, who came to work here right after
the Second World War. Now, Turing’s contributions to the
story of data are undoubtedly vast, but more important for our story is
another person who worked here with Turing, someone who arguably is even
less well known than this place, Donald Davies. Davies worked on secret British
nuclear weapons research during the
war… ..later joining Turing at NPL, climbing the ranks to be put in
charge of computing in 1966. As well as the new digital
computers, Davies had a lifelong fascination
with telephones and communication. His mother had worked in the Post
Office telephone exchange, so even when he was a kid, he had a real understanding of how
these phone calls were routed and rerouted through this growing
network, and that was the perfect training
for what was to follow. What was Donald Davies like, then? He was a super boss because he was
very approachable. Everybody realised he’d got huge
intellect but not difficult with it. Very nice guy. Davies’ innovation was to develop,
with his team, a way of sharing data between
computers, a prototype network. Donald had spotted that there was a
need to connect computers together and to connect people to computers,
not by punch cards or paper tape or on a motorcycle, but
over the wires, where you can move files or
programs, or run a program remotely on another
computer, and the telephone network is
not really suited for that. In the pre-digital era, sending an encoded file along a
telephone line meant that the line was engaged for as long as the
transmission took. So the opportunity here was because
we owned the site, 78 acres with some 50 buildings,
we could build a network. Davies’ team sidestepped the
telephone problem by laying high-bandwidth data cables
before instituting a new way of moving data around the network. The technique he came up with
was packet switching, the idea being that you take
whatever it is you’re going to send, you chop it up into uniform pieces,
like having a standard envelope, and you put the pieces into the
envelope and you post them off and
they go separately through the network and
get reassembled at the far end. To demonstrate this idea, Roger and I are convening NPL’s first-ever packet-switching
data-dash… ..which is a bit more complicated
than your average sports-day event. The course is a data network. There are two computers, represented here as the start and finish signs. Those computers are connected by a series of network cables and nodes. In our case, cables are lines of cones and the connecting nodes are
Hula Hoops. Having built it, all we need now
are some willing volunteers. And here they are. NPL’s very own apprentices. So welcome to our packet-switching
sports day. We’ve got two teams, red and blue. ‘Both teams are pretending to be
data ‘and they’re going to have to race.’ You’re going to start over there where it says “start”, kind of
obvious, and you’re trying to get through to
the end as quickly as you possibly can. You can’t just go anywhere, you have to go through these hoops
to get to the finish line, these little nodes in our network. You’re only allowed to travel along
the lines of the cones, but only if there’s nobody else
along that line. All clear? OK, there is one catch. All of you who are in the red team, we are going to tie your feet
together. So you’ve got to travel round our
network as one big chunk of data. Those of you who are in blue, you
are allowed to travel on your own, so it’s slightly easier. ‘The objective is for both teams to
deposit their beanbags ‘in the goal in the right order,
one to five.’ EXCITED CHATTER Get in the hoop! Get in the hoop! Bring out your competitive
spirit here. We’ve got packets versus big chunks
of data. I’m going to time you.
Everyone ready? OK, over to you, Roger. TOOT! Remember, you can’t go down the
route until it’s clear. ‘The red and blue teams are exactly
the same size, ‘let’s say five megabytes each. ‘But their progress through the
network is clearly very different.’ THEY LAUGH OK, blues, you took 13 seconds,
pretty impressive. Reds, 20 seconds. That’s a victory for the packet
switchers. Well done, you guys! Well done, you
guys. The impact that packet switching has
had on the world, I mean, it sort of came from here and
then spread out elsewhere. It did indeed, we gave the world
packet switching, and the world, of course, being America,
they took it on and ran with it. This little race, Donald Davies’
packet switching, was adopted by the people that would
go on to build the internet, and today, the whole thing still
runs on this idea. Let’s say I want to e-mail you
a picture of Molly. First, it will be broken up into
over 1,000 data packets. Each one is stamped with the address
of where it’s from and where it’s going to, which routers check
to keep the packets moving. Regardless of the order they arrive, the image is reassembled,
and there she is. This is quite a cool thing, right, that you’ve got one of the original
creators of packet switching right here
and you can ask him… Every time you’re like…
Well, do anything, really. “Why is my internet running
so slowly?” THEY LAUGH
Don’t ask me! We’ve come a very long way in just a
few decades. Around 3.4 billion people now have
access to the internet at home and there are around four times the
number of phones and other data-sharing devices
online, the so-called Internet of Things. Just by being alive in the 21st
century with our phones, our tablets,
our smart devices, all of us are familiar with data. Really embrace your inner nerd here,
because every time you wander around looking at your screen, you are
gobbling up and churning out absolutely tons of the stuff. Our relationship with data
has really changed – it’s no longer just for specialists,
it’s for everyone. There’s one city in the UK that’s
putting the sharing and real-time analysis of data at the heart
of everything it does – Bristol. Using digital technology,
we take the city’s pulse. This data is the route to an open,
smart, liveable city, a city where optical, wireless and
mesh networks combine to create an open,
urban canopy of connectivity. Taking the pulse of the city under
a canopy of connectivity might sound a bit sci-fi, or like
something from a broadband advert. But if you just hold on to your
cynicism for a second, because Bristol are trying to build
a new type of data-sharing network
for its citizens. There’s a city-centre area which
now has next-generation or maybe the generation after next of superfast broadband and then
that’s coupled to a Wi-Fi network,
as well. The question is, what can
you do with it? We would have a wide area network
of very simple Internet of Things sensing devices that just monitor
a simple signal like air quality or traffic queued in a traffic jam. Once you’ve got all this network
infrastructure, you can get an awful lot,
a really huge amount of data arriving to you in real time. What’s happening here is a
city-scale experiment to try and develop and test what’s going to be called the
programmable city of the future. It relies on Bristol’s futuristic
network, vast amounts of data from as many
sensors as possible and a computer system that can
simulate and effectively reprogram the city. The computer system can intervene. It could reroute traffic and we can
actually radio out to individuals, so maybe they get a message on their
smartphone or perhaps a wrist-mounted device, saying, “If you have asthma,
perhaps you should get indoors.” Once you create that capacity for
anything and everything in the city to be connected together, you can really start to re-imagine
how a city might operate. We are starting to experiment with
driverless cars and, in order for driverless cars to work, they have to be able to communicate
with the city infrastructure. So, your car needs to speak to the
traffic lights, the traffic lights need to speak to
the car, the cars to speak to
each other. All of that requires a completely
different set of infrastructure. Of course, as the amount of data a
city can share grows, the computing power needed to do
something useful with it must
grow, too. And for that, we have the cloud. For example, imagine trying to
analyse all of Bristol’s traffic
data, weather and pollution data on your
home computer. It could take a year. Well, you could reduce that to a day
by getting 364 more computers, but that’s expensive. A cheaper option is sharing the
analysis with other computers over
the internet, which Google worked out first, but
they published the basics and now free software exists to help
anyone do the same. Big online companies rent their
spare computers for a few pence an hour. So, now anyone like me or you can do big data analytics quickly
for a few quid. Such computing power is something we
could never have dreamt of just a few years ago, but it will
only fulfil its potential if we can share our own data in a
safe and transparent way. If Bristol Council wanted to know
where your car was at all times but could use that information to
sort of minimise traffic jams, how would you feel about something
like that? Er, I’m not sure if I’d
particularly like it. I think it is up to me where
I leave my car. I understand the idea of justifying
it with all these great other ideas, but I still probably wouldn’t like
it very much. If they are using it for a better
purpose, then yeah, but one should know how they are
using it and why they’ll be using
it, for what purpose. I’d like to imagine a world in which
all the data that was retained was used for the greater good
of mankind, but I can’t imagine a circumstance
like that in the world that we have today. We live in a modern society, where
if you don’t let your data out there, not in the
public domain, but in a secure business domain, then you can’t take part in
society, really. Unsurprisingly, people are pretty
wary about what happens to
their data. We need to be careful that civil
liberties are not eroded, because otherwise the technology is
likely to be rejected. I think it’s an area where us as a
society have yet to sort of fully understand what the correct way
forward is and therefore it is very much a
discussion. It’s not a lecture, it’s not a code, it’s one where we are
co-producing and co-forming these
sorts of rules with people in the city, in order to
sort of help us work out what the right and wrong things to do are. It will be intriguing to watch
Bristol grapple with
the technological and ethical challenges of being our
first data-centric city. In all these contexts, Internet
of Things… forms of health care,
smart cities, what we’re seeing is an increase
in transparency. You can see through the body, you
can see through the house, you can see through the city and the
square, you can see through society. Now, transparency may be good. It’s something that we may need to
handle carefully in order to extract the value from those data to improve your lifestyle, your
social interactions, the way in which your city works
and so on. But it also needs to be carefully
handled, because it’s touching the ultimate nerve of what it means
to be human. So how much data should you
give away? Traffic management is one thing but
when it comes to health care, the stakes, the risks and benefits
are even higher. And in Bristol, with a project
called Sphere, they’re pushing the boundaries
here, too. The population is getting older, and
an ageing population needs more intense health care, but it’s
very difficult to pay for that
health care in institutions, paying for nurses
and doctors. So, the key insight of the Sphere
team was that it’s now possible to arrange, in a house, lots of
small devices where each device is monitoring a
simple set of signals about what’s going on in that house. There might be monitors for your
heart rate or your temperature, but there might also be monitors
that notice, as you’re going up and
down stairs, whether you’re limping or not. They’ve invited me to go and spend a
night in this very experimental house, but
unfortunately, I’m not allowed to
tell you where it is. The project is a live-in experiment
and will soon roll out to 100 homes across Bristol. It’s a gigantic data challenge,
overseen by Professor Ian Craddock. So, that’s one up there, then? Yes, that’s one of the video sensors and we have more sensors
in the kitchen. We have another video camera in the
hall and some environmental sensors, and a few more in here. The house can generate 3-D video, body position, location and
movement data from a special wearable. How much data are you
collecting, then? So, when we scale from this house to
100 houses in Bristol, in total we’ll be storing over two
petabytes of data for the project. Lord. So, on my computer at home, I
don’t even have a terabyte
hard drive and you’re talking about 20,000
of those. Yes. I mean, you know, the
interaction of people with their
environment and with each other is a very
complicated and very variable thing and that’s why it is a very
challenging area, especially for data analysts, machine learners, to make sense of
this big mass of data. I’m happy to find out that the
research doesn’t call for cameras in the bedroom
or bathroom, but I do have to be left entirely on
my own for the night. The very first thing I’m going to do
is pour myself a nice bloody big glass of wine. There we go. So, that nice glass of wine that I’m
enjoying isn’t completely
guilt-free, because I’ve got to admit to it to
the University of Bristol. I have to keep a log of everything
I do, so that the data from my stay can be
labelled with what I actually got
up to. In this way, I’ll be helping the
process of machine learning, teaching the team’s computers how to
automatically monitor things like cooking, washing and sleeping, signals in the data of normal
behaviour. In the interests of science. ‘I was also asked to do some things
that are less expected.’ Oh! I spilled my drink. ‘The team need to learn to detect
out-of-the-ordinary behaviour, too, ‘if they want to, one day, spot
specific signs of ill health.’ Right, I’m going to run this back to
the kitchen now. It’s a fairly strange experience. I think the temperature sensors, the
humidity sensors, the motion sensors, even
the wearable I don’t have a problem with at all. For some reason the body position is
the one that’s getting me. On the flipside, though, I would go
absolutely crazy to have this data. This is the most wonderful…
My goodness me. Everything you could learn about
humans. It would be so brilliant. One thing I wanted to do was to do
something completely crazy just to see if they can spot it in
the data. Just to kind of
test them. OK, ready? I can’t believe this is my life now. ‘Anyone can get the data from my
stay online if they fancy trying to
find ‘my below-the-radar escape. ‘The man in charge of machine
learning, Professor Peter Flach, ‘has the first look.’ Between nine and ten, you
were cooking. Correct. Then you went into the lounge. You
had your meal in the lounge. You know what? I ate on the sofa. And you were watching
crap television. I was watching crap television? I’ve been found out. We didn’t switch the crap-television
sensor on. That’s not on here,
but OK. So, you were in the lounge sort of
until 11:30. Correct. Then you went upstairs, there’s a
very clear signal here. And then, from then on, there isn’t
a lot of movement. I was in bed. So, I guess you were in bed. Sleeping. Normal activities, like cooking or
being in bed, are relatively straightforward to spot. But what about the weird stuff? This is yesterday, again. I can see it. I can see the moment. You can see the moment?
I can see it, yeah. There’s something happening here
which is sort of rather quick. You’ve been in the lounge for quite
a while and then, suddenly, there’s a brief move to the
kitchen here and then very quick cleaning up
in the lounge. I wasted good wine on
this experiment. Good wine? Humans are extraordinarily good at
spotting most patterns. For machines, the task is much more
challenging, but, once they’ve learned what
to look for, they can do it tirelessly. I suppose, in the long run, if you are going to scale this up to
more houses, you can’t have people sifting
through these graphs trying to
find… I mean, you have to train computers
to do them. You have to train computers to do
them. One challenge that we are
facing is that our models, our machine learning classifiers and
models, need to be robust against changes in layout, changes
in personal behaviour, changes in the number of people that
are in a house. And maybe we are wildly optimistic
about what it can do, but we are in the process of trying
to find out what it can do, at what cost, at what… ..invasion into privacy, and then we
can have a discussion about whether, as a society, we want this or not. If this type of technology
rolls out, machines will be modelling us in
mathematical terms and intervening to help keep us
healthy in real time – and that’s completely new. It’s true that our fascination with
machine, or artificial, intelligence is as old as computers themselves. Claude Shannon and Alan Turing both
explored the possibilities of machines that could learn. But it’s only today, with torrents of data and
pattern-finding algorithms, that intelligent machines will
realise their potential. You’ll hear a lot of heady stuff
about what’s going to happen when we mix big data with artificial
intelligence. A lot of people, understandably, are
very anxious about it. But, for me, despite how much the
world has changed, the core challenge is the same as it
always was. It doesn’t matter if you are William
Farr in Victorian London trying to understand cholera or in one of
Bristol’s wired-up houses, all you’re trying to do is to
understand patterns in the data using the language of mathematics. And machines can certainly help us
to find those patterns, but it takes us to find the
meaning in them. We should be worried about what
we’re going to do with these smart
technologies, not about the smart
technologies in themselves. They are in our hands to shape
our future. They will not shape our futures
for us. In the blink of an eye, we have
gone from a world where data, information and knowledge belonged
only to the privileged few, to what we have now, where it
doesn’t matter if you’re trying to
work out where to go on holiday next or
researching the best cancer
treatments. Data has really empowered all of us. Now, of course, there are some
concerns about big corporations hoovering up the data traces that we
all leave behind in our everyday
lives, but I, for one, am an optimist as
well as a rationalist and I think that if we can marshal
together the power of data, then the future lies in the hands of
the many and not just the few. And that, for me, is the real joy
of data. MUSIC: Good Vibrations
by The Beach Boys

Leave a Reply

Your email address will not be published. Required fields are marked *