Data Science

What is Data Science?

Data science is generally understood to mean the analysis of data and the creation of useful insights from combining specific expertise with statistics, mathematics, and programming skills.

Data scientists use artificial intelligence (AI) and machine learning in order to process information such as data tables, text, images, videos, or audio data. Different goals may be pursued in this process: one such goal might be the development of systems that mimic human perception. This enables such diverse tasks as defect categorization in products, machine translation, and autonomous driving. Other applications utilize complex datasets (e.g., process and production data) that are very difficult to understand by humans due to the volume of data or the number of interdependent input parameters. With the help of AI and machine learning, correlations can be found even in these datasets.

In the past, data science has had the reputation of being applicable only to (very) large datasets. However, modern data-centric approaches make it possible that small datasets can also be used. To this end, data are carefully selected, combined and pre-processed by experts. For example, classification algorithms can be successfully trained on fewer than 100 examples per class. With my background in physics and industrial R&D, I am particularly interested in use cases that rely on the application of domain knowledge.

What is Machine Learning?

Machine learning is a branch of artificial intelligence that aims to fit the parameters of empirical models to a known dataset.

The model should adapt to basic patterns of the known data (generalization) and be able to predict previously unknown data points that follow the same patterns, with as little deviation as possible.

What is Deep Learning?

Deep learning is machine learning with neural networks.

A neural network represents a large mathematical equation that describes the connections between input and target variables.

Originally, it was believed that the human brain was built on the principle of neural networks, although this has since been disproved. Contrary to popular belief, the architectures of today’s neural networks bear little resemblance to the human brain.

Life cycle of a data science project

After an initial meeting and brainstorming session to identify the business problems to be solved, a data science project typically proceeds as follows:

Problem identification

Obtain clear agreement on project goals, requirements, and commitments.

Data review and exploratory data analysis

Can relevant features be extracted from existing data? Are additional data needed? What costs and efforts are required to obtain them?

Creation of an initial model

Enable a quick first estimate of the complexity and expense of the project.

Iterative optimization of the model and dataset using training data

Achieve the performance specified for the model.

Deployment of a proof-of-concept model and testing with previously unknown application data

Verify the performance of the model in the application environment.

Further monitoring of the model

If necessary, train the model further using new training data.

Result:

An AI algorithm tailored to your application, trained on your data, and fully integrated into your daily processes.