While describing various algorithms, I have casually used the term "data" without considering what data is. Although I have a vague idea of what it is, honestly, I am not sure (I have briefly talked about big data in one of my blog posts). Generally speaking, data refers to computer information, which is transmitted or stored. However, there are many other different definitions. These definitions can mostly be tied to types of data. Therefore, data can be viewed as types of information formatted in a specific manner (and therefore, conforming to a particular communication protocol). This leads to the following truth: computer networks can't exist without neither communication protocols, nor data.
The Collection of Instructions to Manipulate Data: Programs
The term big data has been coined with the evolution of technology and a lot of data sources. In one of my previous blog posts, I have defined big data as many different types of data and formats, as well as the massive amounts of data for which databases must provide huge capacities. For the analysis of the data, it is necessary to quickly access the searched values and process them with high performance.
However, the meaning of data does expands way beyond the processing of data in computing applications. Depending on your field, you will probably encounter different types of data such as audio, video, texts, bytes and bits inside the memory of electronic devices, amongst others. Whatever the data and different formats it is delivered to power a certain application, it is the data scientists job to clean up the messy data to then have it ready to fit into a program.
What is Data Science?
It can be seen as the process of using algorithms, methods, systems to extract knowledge and insights from structured as well as unstructured data. To do so, machine learning and data analytics are used to make predictions, facilitate optimization, and improve programs and applications.
There are downsides to data being implemented incorrectly and proven to turn up with widely biased results, sometimes even racist and sexist. Therefore, it is imperative to ask the following questions when looking into AI and machine learning applications:
1. What is the model?
2. What is the prediction accuracy (and what is it based on)?
3. What is the goal of the model, and is the training data relevant to that?
Only if these questions are elaborated and critically approached can the power of these solutions and applications be valued appropriately. As always, ask a lot of questions, and stay curious!
For this blog post, I have used the following sources:
Article about data science and the definition of data.
This one, more towards data science. And last but not least, this one about questions about machine learning and AI modeling. Have fun reading through it!