The use of artificial intelligence (AI) brings with it responsibility. Transparency, explainability and fairness are essential principles that must be guaranteed, as must the high performance of the AI system. In order to meet these requirements, it makes sense to focus on areas with a tradition of verifiable processes. Although these processes do not function flawlessly, security standards cannot be implemented without them. This is most evident in safety-critical and regulated industries such as medicine, but also in aerospace and finance.
Similar to how these areas need processes to meet relevant requirements, a company that uses AI systems needs regulated processes through which it controls access to machine learning models (ML), implements guidelines and legal requirements, the interactions with the Tracks models and their results, and records on what basis a model was created. Overall, these processes are referred to as model governance. Model governance processes are to be implemented from the beginning in every phase of the ML life cycle (design, development and operations). The author has commented in more detail elsewhere on the specific technical integration of model governance into the ML lifecycle.
Model governance: a must in the forest of rules and regulations
Model governance is not optional (see box “Model governance checklist”). On the one hand, there are already existing regulations that companies in certain sectors have to comply with. The importance of model governance can be well illustrated using the example of the financial sector: lending systems or interest rate risk and pricing models for derivatives are risky and require a high degree of control and transparency. According to an Algorithmia study on the top trends in AI deployment for 2021, the majority of companies are bound by regulatory compliance – 67 percent of respondents have to comply with multiple regulations. Only 8 percent stated that they were not subject to any legal requirements.
The scope of the regulations is likely to increase further in the future: in April 2021, the EU published a regulation as the first legal framework for AI, which would supplement existing regulations. The draft divides AI systems into four different risk categories (“inadmissible”, “high”, “limited”, “minimal”). The risk category defines the type and scope of the requirements that are placed on the respective AI system. AI software that falls into the high risk category must meet the most stringent requirements.
Using machine learning comes with responsibilities and obligations. In order to meet these requirements, a company needs processes through which it
controls access to ML models implements guidelines/legal requirements tracks the interactions with the ML models and their results records the basis on which a model was created
Model governance designates these processes in their entirety
Checklist:
Complete model documentation or reports. This also includes the reporting of metrics using suitable visualization techniques and dashboardsVersioning of all models to create external transparency (explainability and reproducibility)Complete data documentation to ensure high data quality and compliance with data protectionManagement of ML metadataValidation of ML models (audits)Continuous monitoring and Logging model metrics
These include the following aspects: robustness, security, accuracy (accuracy), documentation and logging as well as appropriate risk assessment and risk mitigation. Further requirements are the high quality of the training data, non-discrimination, traceability, transparency, human monitoring and the need for a conformity check and proof of conformity with the AI regulation by means of a CE marking (see box “Plan it Legal”). Examples of ML systems in this category are private and public services (such as credit checks) or systems used in school or vocational training to decide on an individual’s access to education and career (such as automated evaluation of exams).
The conformity of HRKI with the AI regulation will become the prerequisite for marketing in the EU. It can be proven by a CE marking. The EU will also adopt standards, compliance with which will presume compliance with the regulation.
The responsible authorities should develop “sandboxing schemes”, i.e. specifications for secure test environments, for the comprehensive tests that are required under the AI Ordinance. The compliance check for AI is based on an ex-ante perspective, but nevertheless has similarities with the data protection impact assessment under the GDPR. More information can be found in the blog entry by Dr. Benhard Freund at planit.legal: “The EU’s AI law – draft and discussion status”.
Achieve compliance for European AI regulations
Since the regulation is intended to apply not only to EU-based companies and individuals, but to any company offering AI services within the EU, the law would have a similar scope to the GDPR. The regulation has to be approved by the European Parliament and has to go through the legislative processes of the individual member states. If the EU Parliament approves the regulation and it passes the legislative processes of the EU states, the law will come into force in 2024 at the earliest. Then high-risk systems must undergo a conformity assessment for AI requirements during development in order to have the AI system registered in an EU database. In the last step, a declaration of conformity is necessary so that AI systems receive the necessary CE marking so that their providers can place them on the market.
It is also important that regulation is not the only decisive aspect for model governance processes. Because even models that are used in less regulated contexts cannot avoid model governance. In addition to meeting legal requirements, companies have to avert economic losses and loss of reputation as well as legal difficulties. ML models that provide a marketing department with information about the target group can lose precision in operation and provide an incorrect information basis for important follow-up decisions. They therefore represent a financial risk. Model governance is therefore not only required to meet legal requirements, but also to ensure the quality of ML systems and to reduce business risks.
Model governance as a challenge
The emerging EU requirements, existing regulations and corporate risks make it necessary to implement model governance processes right from the start. For many companies, however, the importance of model governance often only arises when ML models go into production and are to be in line with legal regulations. In addition, the abstract nature of legal requirements presents companies with the challenge of practical implementation: According to the Algorithmia study already cited, 56 percent of those surveyed state the implementation of model governance as one of the greatest challenges in order to make ML applications successful in the long term to bring production. The figures from the “State of AI in 2021” study with a view to the risks of artificial intelligence also fit in with this: 50 percent of the companies surveyed state compliance with legal regulations as a risk factor, others raised deficiencies in explainability (44 percent of those surveyed) and reputation (37 percent), justice and fairness (30 percent) as relevant risk factors.
Audits as standardized review processes in the model governance framework
An important part of model governance are audits as tools to check whether AI systems comply with company policies, industry standards or regulations. There are internal and external audits. The Gender Shades study discussed by the author in the article “Ethics and artificial intelligence: a new way of dealing with AI systems” on Heise is an example of an external audit process: It tested facial recognition systems from large providers with regard to their accuracy with regard to gender and ethnicity and was able to found a different precision of the model depending on ethnicity and gender.
However, this view from the outside is limited, since external test processes only have access to model results, but not to the underlying training data or model versions. These are valuable sources that companies must include in an internal audit process. These processes are designed to enable critical reflection on the potential impact of a system. First of all, however, the basics of AI systems must be clarified at this point.
Peculiarities of AI systems
To be able to test AI software, it is important to understand how machine learning works: Machine learning is a set of methods that computers use to make and improve predictions or behaviors based on data. To build these predictive models, ML models need to find a function that produces an output (label) for a given input. To do this, the model requires training data that contains the appropriate output for the input data. This learning is called “supervised learning”. In the training process, the model uses mathematical optimization methods to find a function that maps the unknown relationship between input and output as well as possible.
An example of a classification would be a sentiment analysis, which is intended to examine whether tweets contain positive or negative moods (sentiments). In this case, an input would be a single tweet, and the associated label would be the coded sentiment set for that tweet (−1 for negative, 1 for positive sentiment). In the training process, the algorithm uses this annotated training data to learn how input data is related to the label. After training, the algorithm can then independently assign new tweets to a class.
More complex components in the machine learning area
Thus, an ML model learns the decision logic in the training process, rather than explicitly defining the logic in code with a sequence of typical if-then rules, as would be typical in software development. This fundamental difference between traditional and AI software means that methods of classic software testing cannot be directly transferred to AI systems. Testing is complicated by the fact that in addition to the code, there is also the data and the model itself, all three components being mutually dependent according to the change-anything/change-everything principle (see “Hidden Technical Debt in Machine Learning Systems” for more on this). .
For example, if the data in the productive system differs from the data with which a model was trained (distribution shifts), the performance of the model drops (model decay). In this case, a model needs to be quickly trained with fresh training data and redeployed. To make matters worse, testing AI software is an open field of research with no consensus and no best practices.