Skip to main content

Why AI Cannot Survive Without Big Data

It may come as no surprise that the internet has been swelling up with an increasing amount of data, so much so that it’s become difficult to keep track of. If in 2005 we were barely dealing with 0.1 zettabytes of data, this number is now just above 20 zettabytes and it is even estimated to reach a staggering 47 zettabytes by 2020. Apart from the sheer enormous quantity of it, the problem resides in the fact that it’s mostly unstructured. And there’s nothing more harmful for mankind than providing AI with incomplete or inaccurate data.

It seems that we are dealing with about only 10% of structured data, while the rest is just a great jumble of information that isn’t tagged and cannot be used in a constructive way by machines. For a better understanding on this subject, it’s good to know that email does not qualify as structured data, while anything such as a spreadsheet is considered to be tagged and can successfully be scanned by machines.

This may not seem that problematic, but we need to have clean and organized data if we expect AI to improve our lives in sectors such as healthcare, driverless cars, connected homes and so on. The irony is that we’ve become really good at creating content and data, but we haven’t yet figured out a way to accurately leverage it to serve our needs.

Data Scientists Are Also Struggling

It’s only natural that data science is one of the fields that gained a lot of ground across these past years, with more and more data scientists dedicating their lives to sort out the mess. However, a recent survey shows that contrary to popular opinion, data scientists spend a lot less time on building algorithms and mining data for patterns, but rather on doing this so-called digital janitorial work — cleaning and organizing data. As you can see, the numbers are certainly not in favor of a bright AI future.

Predictors of the impeding humankind wipe-out by AI have clearly not taken into consideration the fact that although machines can successfully replace the few data scientists that are actually mining data for patterns, they may not be able to replace the vast majority of scientists who devote most of their time to collecting, cleaning and organizing this data. Of course, it’s better to simply collect data in a more integral way straight from the get-go, rather than to allocate so much time and resources to ‘fix’ it retroactively. Fortunately, leaders in AI have slowly reached this understanding as well, using their skills and influence to redirect the path on which data science is headed — and implicitly with it, AI.

AI Is Good, But It’s Not Yet Human-good

We’ve all heard cases of machines which proved to be superhuman when faced with actual humans, such as the case when the best Go player in the world was defeated by Google’s AlphaGo AI. However, this only shows that AI can be capable of staggering results in niche tasks, but its overall capacity is still no match to human capabilities. There are lots of subtleties and logical steps that AI simply cannot deal with.

AI’s limitations are even more noticeable when it comes to dealing with financial filings and legalese. It’s the same issue here as it is everywhere else. As long as AI machines are not fed structured data, such as standardized contracts, they will get seriously confused. This means that for the time being, it’s still up to qualified data scientists to undo the mess.

Effective AI Is Possible Only When Everyone Works As a Team

Highly qualified data analysts are expensive to hire, making it further problematic to advance in this field. The key is to go through the collection and modeling phase armed with the technology that can streamline the process.

Another key aspect is the joint efforts of multiple departments to tackle and solve the issue that big data poses. Financial and technical experts need to join hands in order to correctly identify from the start the potential flaws in the data they collected. The way in which these experts tackle a problem should also be registered in order to be then successfully replicated by machines. The goal is to create quality assurance algorithms which can pinpoint modelled results that were connected to errors in the past. The more such models we are able to create, the less room there will be for data errors and irregularities.

AI Cannot Survive Without Big Data

Regardless of the direction AI is taking — if it’s good or bad for mankind — one thing is for sure: AI cannot go anywhere without big data. And we already have examples from our daily lives that we most likely take for granted, which prove how necessary AI was in their existence. Take for example Cortana or Siri. They are able to understand our questions and queries only because they’ve been fed endless amounts of information that helped them understand our natural language. Google has become this giant omniscient power that knows so much about each and every one of us, only as a result of our numerous daily entries on its search engine. To this end, companies are also able to make accurate reports — for example, those which can identify websites using revcontent, only thanks to the neatness with which that data was initially collected.

Since AI is so deeply connected to big data, it only makes sense that it has access to clean, structured data for it to process in a way that improves our lives. Fortunately, the world is gradually becoming more understanding of the needs behind AI advancements. This is why we are noticing an improvement in the way data scientists are served by their jobs in terms of funding, wages, tools and equipment available.

This awareness is slowly spreading across the globe, enabling companies and experts to cooperate with each other in order to collect data more efficiently, establish models that can further help machines clean and structure data and also set the groundwork for future generations to come. Knowing where the issues with AI and big data stem from means that the problem is halfway solved.

The post Why AI Cannot Survive Without Big Data appeared first on SmartData Collective.



from SmartData Collective https://ift.tt/2GF7i9H
via IFTTT

Comments

Popular posts from this blog

Controlling legend appearance in ggplot2 with override.aes

[This article was first published on Very statisticious on Very statisticious , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In ggplot2 , aesthetics and their scale_*() functions change both the plot appearance and the plot legend appearance simultaneously. The override.aes argument in guide_legend() allows the user to change only the legend appearance without affecting the rest of the plot. This is useful for making the legend more readable or for creating certain types of combined legends. In this post I’ll first introduce override.aes with a basic example and then go through three additional plotting scenarios to how other instances where override.aes comes in handy. Table of Contents R packages Introducing override.aes Adding a guides() layer Using the guide argument in scale_*() Changing multiple aesthetic par...

Using RStudio and LaTeX

(This article was first published on r – Experimental Behaviour , and kindly contributed to R-bloggers) This post will explain how to integrate RStudio and LaTeX, especially the inclusion of well-formatted tables and nice-looking graphs and figures produced in RStudio and imported to LaTeX. To follow along you will need RStudio, MS Excel and LaTeX. Using tikzdevice to insert R Graphs into LaTeX I am a very visual thinker. If I want to understand a concept I usually and subconsciously try to visualise it. Therefore, more my PhD I tried to transport a lot of empirical insights by means of  visualization . These range from histograms, or violin plots to show distributions, over bargraphs including error bars to compare means, to interaction- or conditional effects of regression models. For quite a while it was very tedious to include such graphs in LaTeX documents. I tried several ways, like saving them as pdf and then including them in LaTeX as pdf, or any other file ...