Most banks frequently face challenges caused by bad data quality and quantity-related issues. Distributed learning is an approach to leverage data across banks and delivers some concrete concepts that could mitigate these issues.
Data quality is a serious issue for banks trying to unleash the power of their data. Avaloq’s data modeling approach, the Enterprise-Wide Object Model (EWOM), provides a foundation that limits the impact of these issues. In this blog, we propose how you can build on the foundation that Avaloq’s EWOM provides, utilizing a concept called 'distributed learning'. With this approach, you can benefit from cross-industry learning, enabling better predictive models that can be applied to your bank or wealth manager.
We will illustrate these points based on a particular use case in the wealth management area: a recommendation engine to support relationship managers in generating smart trade ideas that are tailored toward the investor. This recommendation engine has two immediate goals: driving execution-only investors toward advisory mandates and increasing the proximity between discretionary managed investors’ preferred portfolios and the house views (e.g. robo-advisory). In the long run, both outcomes should lead to an increase in client retention.
Data inconsistencies in the banking sector and Avaloq’s EWOM
Banks and wealth managers have traditionally operated in silos: data was generated for specific business sectors and jurisdictions, acted upon and then stored locally. There was less effort put into maintaining quality and minimizing redundancies or incorrect information between entities. Many critical bits of information were stored in a free text format, making automated processing tedious - if not almost impossible.
Conversely, from day one, Avaloq designed its system architecture around the concept of a single unified Enterprise-Wide Object Model (EWOM). The EWOM has allowed all Avaloq clients to accurately model their internal and external data in a consistent manner. The EWOM limits the use of free text and arbitrary linkages of attributes. This has introduced a level of consistency that is missing in other banks and wealth managers that are not using such a model.
Leveraging information across banks
As the number of Avaloq clients increased, so did the level of data consistency - not just within these banks and wealth managers but also across them. The latter led to ideas on further using data collected across banks and wealth managers to generate insights that could be valuable from a business point of view – if they consented to do so. Such insights would never be gained by merely analyzing information within a single bank. For example, in the context of our use case (a recommendation engine) in which we tailor trade ideas to the investor based on their profile, we utilize information regarding the preferences of similar investors at other banks. This enhances the quality of the recommendations, especially for banks with a small client base to learn from.
A key question arising at this point is: how would we handle data privacy issues? Undoubtedly, banks and wealth managers would not want all their data to be mixed with other banks. At Avaloq, we are investing in a solution to this problem using distributed learning.
Distributed learning is broadly defined as an instructional model that allows instructors, students and content to be located in different non-centralized locations. In the context of data science and predictive modelling, this means that we can develop models on data located across banks, as long as a minimum level of consistency is ensured and a method is in place that can act as an instructor to combine the learning centrally. If successful, this would address the concern of data privacy, while simultaneously enabling the generation of insight across banks and wealth managers.
This concept was implemented in 2017 by Google and is now being used by the likes of Facebook and numerous healthcare companies which handle extremely sensitive medical data that is prone to privacy concerns. A concrete application of distributed learning is Gboard: a virtual keyboard app with a predictive typing engine. This app is trained on data generated by all its users in a distributed fashion, meaning the data never leaves the user’s device, hence preserving privacy.
Distribution learning for recommendation engines
At Avaloq, we have developed a smart recommendation engine to generate intelligently tailored trade ideas that appeal to the investor. While research on this topic continues at a fast pace, interim findings are promising. They suggest that a supervised machine learning approach that considers investor data, including portfolio-specific data, relationship manager data and asset performance data simultaneously when making a trade recommendation, could be very beneficial for wealth advisory.
The benefit of this approach is that the analysis doesn’t just focus on investor datasets in isolation, but is able to combine several datasets – including, for example, relationship manager and asset related information. In other words, the recommendation engine can be tasked with answering the following:
Given investor α, portfolio β, relationship manager γ and asset δ, what is the probability that the combination is a good match?
Mathematically, this comes to the same thing as answering the question: what is the probability that the relationship manager communicates the recommendation and the investor buys it, given the current state of his portfolio?
To facilitate this, we had to determine relevant features that enable comparisons across the dataset. For investors, examples can include gender, domicile, age and income. By explicitly considering the full spectrum of relevant information and describing this in terms of specific features, we can leverage not only past information for a given investor, but also tap into information about investors who have similar behaviours and trade preferences.
We can illustrate the latter point based on an example: Consider an investor who routinely trades simple securities. After some time, this investor starts developing an affinity toward more complex instruments. If the recommendation engine were to only consider this investor’s history to generate trade ideas, it would be unable to detect the newfound interest in complex instruments at this point. On the other hand, if the engine were to also consider the history of similar investors, it could anticipate this change in affinity.
But what if a bank or wealth manager only has a small client base? Presumably, the models developed would be limited in effectiveness due to this small sample size. This would be especially true given a large potential asset universe (>1000 assets). Ultimately, such models benefit from large diverse datasets.
Distributed learning may help in this case. If the defined features are the same across banks, which is effectively the case for all Avaloq clients as they benefit from the Avaloq EWOM, we could set up a two-step learning process (Figure 1). In the initial learning phase, we would first create a training dataset based on anonymized data from all of the banks, with the exception of the bank that the model would be ultimately created for. Subsequently, a model would be trained on this training dataset (see item 1, the shared environment described in the bottom part of Figure 1). This model would then be transferred from the shared environment to the bank-specific environment where client-identifying attributes exist. In the second learning phase, a bank-specific enhancement would be developed, during which the pre-trained model is augmented with data of the bank it has been developed for. With this two-step learning process, we have ensured that we not only make use of anonymized data across banks, but also of the non-anonymized data of the bank the model is ultimately being developed - as well as ensuring data privacy throughout.
Figure 1: Distributed learning. Leveraging data across banks to provide superior predictive power while preserving data privacy.
With this approach, we achieve the following:
- the use of anonymized data across banks for model development
- bank-specific enhancements, providing each bank a tailored model
- sensitive client data is never mixed with other data, remaining securely held in the bank’s environment
In this way, banks and wealth managers can learn from not only their own data, but also across industry datasets and deliver their clients a better, more tailored, set of recommendations. Their clients get better personalized service and more relevant investment proposals.
We are excited about our clients using this approach for the recommendation engine and believe that the large financial client base of Avaloq puts it in an exceptional position to utilize distributed learning to its fullest potential to provide solutions to banks and wealth managers.
In our data science whitepaper, we allude to the recent disruptions in the financial industry created by the advent of advanced analytics, artificial intelligence (AI) and machine learning (ML). We also discuss the challenges in fully leveraging these technologies, be it the enforcement of organizational commitment to a data-driven working style, issues related to bespoke infrastructure, or inconsistencies and inaccuracies in datasets that happen to be stored in siloed systems.