New Algorithms for Relational Learning: What Deep Learning Can't Do

The idea of storing data in relational structures dates back to the 1970s. Today, relational data forms the backbone of every modern business. Mountains of company data are piling up in databases, which can play a central role in catching up on the AI deficit recognized by those responsible. But despite all the innovative spirit, it is important to remember: Up to now, it has only been possible to create value from relational data with machine learning (ML) with enormous effort. This is a circumstance that blocks even large companies from accessing machine learning and business applications with artificial intelligence (AI).
The key lies in the research field of relational learning, which hardly plays a role in practice. A new class of algorithms promises to help here. It transfers the central concept of feature learning from deep learning to relational data structures and makes company data usable for modern algorithms in machine learning.
This post is part of a series of articles to which heise Developer invites young developers – to provide information about current trends, developments and personal experiences. The "Young Professionals" series is published monthly.
Are you a "Young Professional" yourself and want to write a (first) article? Send your suggestion to the editors: developer@heise.de. We're here to help you write.
In their everyday project work, data scientists are often faced with a classic task: based on relational original data from a database such as MySQL, they are to develop a machine learning model for predictive analytics. The resulting models are used across industries and can be designed for a wide variety of applications, such as forecasting customer churn at financial service providers, sales and demand forecasts in retail or predictive maintenance in the manufacturing industry.
The common problem when developing predictive models on relational data is that this data is not suitable as an input for ML models. Data scientists must spend up to 90 percent of their time doing manual work to transform the raw relational data into a representation suitable for use in models like XGBoost.
Developing features is time-consuming, complicated and requires deep expertise. Applied machine learning is feature engineering.

– Andrew Ng, Deep Learning

Feature learning promises to automate these work steps and thus make relational data directly usable within the framework of machine learning. The authors' experiences from everyday data science show that the method allows error-prone and time-consuming manual steps to be avoided and leads to better ML models.

What is relational learning?

To train a forecast model using supervised learning, data scientists need a data set with a designated target variable. In many textbook examples or data science competitions, the dataset is just a flat table. This table forms the statistical population of the model. Each line of this table corresponds to an independent observation, to which a target variable – also called an output value – and a fixed number of measurable and observable attributes are to be assigned (see Fig. 1).
Data scientists refer to these attributes as features, they form the input values of the model. A real estate price forecast illustrates the relationship between the target variable and features: the target variable here is the value of the property and a possible feature is the number of square meters available for each property.
In the training phase, the algorithm learns the parameters of the model and thus a generalized, functional relationship between input and output data. Applied to new input data, the learned model can then be used to forecast output data that is still unknown.

Example of a population table

However, in many use cases, the input data is not exclusively in the form of a flat table. Especially within company applications, it is more efficient to organize the information arising in the process in relational data structures. Statistics show that it is widespread: according to the DB Engine Ranking, seven of the ten most popular databases are relational. If the input values of a forecasting model are divided into related tables, this is an application from a sub-area of statistical relational learning.
In addition to the population table, from a relational learning perspective, there is another class of tables in the relational data schema: peripheral tables, which contain observations of other attributes in their rows. Peripheral tables can have many-to-many relationships with the population table. Similarly, a row from the population table has a 1:n relationship to rows in the peripheral table. In relational learning, it is common for each observation row in the population table to have a different number of associated rows in the peripheral tables (see Fig. 2).
Since in practice a database often contains a large number of tables, the relationships between the tables lead to complexity. Well-known star or snowflake schemes are visualized. An example is a customer churn forecast: the target variable encodes whether a customer will place another order in a certain period of time. A peripheral table could now contain a varying number of further observations for each customer number, for example past purchases or digital customer activities.
But how does the way data scientists work changes as soon as they have training data with relational structures to develop a forecast model? What do you have to do in order to be able to develop a forecast model anyway?
How machine learning with relational data has worked so far
In the absence of contemporary self-learning algorithms that process relational input data, they have no other option than to convert the existing relational data into a compatible representation in the first step. But why aren't the peripheral tables simply left out? That would be by far the worst possible approach. These tables in particular usually contain information that is closely related to the target variable, such as the historical transactions of a customer mentioned in the example of customer churn. If this information were not available to the machine learning model in the training phase, the forecast quality of the resulting model would decrease.
Feature engineering on relational data usually takes place in programming languages such as SQL, Python or R (Fig. 2).

Once the data scientists know all the information relevant to the forecast, they can replace the relational relationship between the population table and the peripheral table with scalar feature values. You can then transfer these values to the forecast model. Resolving these 1:n relationships between an observation in the population table and the peripheral tables is called feature engineering. For this purpose, data scientists program aggregation functions including secondary conditions.
The challenge in feature engineering lies in identifying the relevant aggregations and constraints. For example, referring to the customer churn example, a feature corresponds to COUNT aggregation over all purchases made by a customer in the last 90 days. Instead, the SUM aggregation of the value of the respective purchases of the past 45 days could also be considered. These are two relatively simple features from almost endless combinations of aggregation functions and arbitrarily complex constraints. It is not clear in advance which of the possible features will provide the information relevant to the forecast and so far has to be tried out manually (see Fig. 3).

The process of feature engineering

In practice, feature engineering turns out to be a time-consuming process: It requires close cooperation between experts with process knowledge and data scientists with method knowledge in order to identify forecast-relevant influencing factors, convert them into a logic and then extract the features based on this logic. Good predictive models often require a three-digit number of features, each of which can span up to a few hundred lines of code. Feature engineering is of great importance in machine learning projects. The features used are decisive for the forecast quality.
Ultimately, some machine learning projects succeed and some fail. What makes the difference? By far the most important factor is the features used.
– Pedro Domingos, Useful Things to Know About Machine Learning
Although the training of the forecast model can be largely automated using AutoML applications, the quality of the forecast depends primarily on the arbitrary assumptions that the teams make in the feature engineering process.
In addition, it is often unavoidable that the importance of individual features decreases in the course of model application or that new features become relevant. This phenomenon is also called feature drift. It therefore requires continuous monitoring and adjustment of the features as well as regular training of the model based on them.

Related Posts

Leave a Reply

%d bloggers like this: