A Brief Overview on the Methods of Impact Evaluation of Public Policies

decision-making processes regarding the formulation, implementation, maintenance or expansion of a policy, program or public action These methods are recommended by the World Bank as support tools not only to assess the effectiveness of policies, but also to verify different ways of implementing the same intervention This characteristic makes impact assessment particularly useful for ABSTRACT Impact assessment is an important tool for the formulation and implementation of evidence-informed public policies. The primary objective of this article is to present a brief review of the methods most commonly used in evaluating the impact of public policies, and the secondary objective is to present examples of the application of these methods in the evaluation of programs to promote physical activity. Quasi-experimental methods are configured as an important alternative for the construction of causal inferences about the impact of policies and programs, especially when it is not possible to participate in these interventions and do not happen randomly. In this sense, we highlight the propensity score matching method and the difference-in-differences estimator, which can be used alone, combined with each other or with other methods to generate valid and robust estimates of the causal effect. At the end of the article, the application of these methods in evaluating the impact of physical activity programs in Brazil and the United States is presented, emphasizing the versatility of these methods to assess the impact by comparing groups of aggregated units (such as municipalities), either to verify the effect of an intervention on groups of individuals. A Brief Overview on the Methods of Impact Evaluation of Public Policies.


Introduction
Non-communicable chronic diseases are a global public health problem, due to their high morbidity and mortality [1].
These diseases also generate temporary and permanent physical incapacities, impact the quality of life of individuals, and are responsible for high public spending on their treatment [2,3].
Among the non-pharmacological strategies for the prevention and control of NCDs, highlight the adoption of healthy behaviors, especially the regular practice of physical activity [4]. Evidence indicates that the practice of 150 to 300 minutes a week of moderate physical activity reduces the risk of developing and dying from chronic non-communicable diseases [5,6], which has led policymakers to develop strategies aimed at adopting more active lifestyles by the population. Other studies indicate that if the number of insufficiently active individuals decreased by 25%, there would be a reduction of 1.3 million deaths from CNCD each year [7]. Although the regular practice of physical activities reduces the risk of illness and death from NCDs, few studies have evaluated the impact of public policies aimed at increasing the population's adherence to more active and healthier lifestyles.
On the other hand, the evaluation of the impact of public policies can help to demonstrate empirically to what degree a given intervention (and only it) changed the results of a variable of public interest [8]. In addition, impact assessment allows identifying the achievement of the proposed objectives for an intervention and contributes to the (re)formulation of evidence-informed policies [7]. The knowledge and use of robust impact assessment methods allow academic research to be articulated with the demands of policymakers and generate evidence that assists in decisionmaking processes regarding the formulation, implementation, maintenance or expansion of a policy, program or public action [8,9]. These methods are recommended by the World Bank as support tools not only to assess the effectiveness of policies, but also to verify different ways of implementing the same intervention [8].
This characteristic makes impact assessment particularly useful for

ARTICLE INFO ABSTRACT
Impact assessment is an important tool for the formulation and implementation of evidence-informed public policies. The primary objective of this article is to present a brief review of the methods most commonly used in evaluating the impact of public policies, and the secondary objective is to present examples of the application of these methods in the evaluation of programs to promote physical activity. Quasi-experimental methods are configured as an important alternative for the construction of causal inferences about the impact of policies and programs, especially when it is not possible to participate in these interventions and do not happen randomly. In this sense, we highlight the propensity score matching method and the difference-in-differences estimator, which can be used alone, combined with each other or with other methods to generate valid and robust estimates of the causal effect. At the end of the article, the application of these methods in evaluating the impact of physical activity programs in Brazil and the United States is presented, emphasizing the versatility of these methods to assess the impact by comparing groups of aggregated units (such as municipalities), either to verify the effect of an intervention on groups of individuals. evaluating physical activity promotion programs, especially if they have different implementation strategies in the local context or if they have been proposed to serve different target audiences.

Challenges of Evaluating the Impact of Public Policies
Ideally, the impact of a public policy on physical activity should be evaluated through methods capable of comparing groups of individuals (or population aggregates) with similar characteristics, who have been randomly selected, but who differ from each other by exposure (or not) to this policy. Thus, the gold standard method for evaluating the impact of public policies is the randomized and controlled trial [9]. On the other hand, carrying out experiments with these characteristics is not always possible, both from ethical issues, as well as for financial, political, or issues related to the interest of individuals and institutions in participating in politics [10]. Personal participation or adherence of an institution, city, province, or state in a given policy does not always occur randomly [10,11], especially when it comes to behavior change related to the practice of physical activities [12]. Voluntary adherence to a policy may, therefore, follow patterns unknown by the evaluator, which limits the ability to balance the characteristics of individuals exposed and not exposed to this intervention [10,11].
In other words, self-selection for participation can result in the formation of a treatment group (group exposed to the policy) with different characteristics from the unexposed group, and this can bias the estimation of the impact of this intervention since the exposed group can be formed by those who are more motivated, with greater political will (in the case of entities, municipalities, and provinces) or who have better physical or technical conditions to adhere to physical activity [9][10][11]. Given the impossibility of randomly selecting participants and non-participants in a policy or program, that is, given a possible selection bias, some additional assumptions are needed to identify the parameter of interest in the impact assessment [13]. In this context, the ignobleness hypothesis describes that there will be no systematic bias when comparing groups with similar observable characteristics, as any information on heterogeneity can be captured by these observable variables, both in the group that underwent the policy and in the comparison group [14].
Therefore, the use of quasi-experimental research designs is recommended, which allow estimating the trajectory of beneficiaries of a physical activity promotion policy, if they had decided not to adhere to it [10][11][12]. In this sense, the use of impact assessment methods based on counterfactual analysis is the best alternative for estimating the specific causal effects of the policy on a given indicator of interest [8][9][10][11]. The method known as the potential results model is widely used in public policy impact assessment, as it compares results of an intervention with estimates of results that would be obtained without the intervention [8,12]. The application of the method consists of comparing (before and after) a sample of analysis units submitted to intervention (treatment group) with another sample of analysis units without intervention [8][9][10][11].

The Propensity Score Matching Method
The evaluation of the impact of public policies requires identification of what would have happened to the group exposed to a given intervention, had it not been implemented (definition of the counterfactual). However, it is not possible to construct a valid and robust counterfactual scenario by merely selecting a group of individuals who have not been exposed to the policy, as it is possible that those who are likely to be more motivated (or managers with greater commitment in terms of health policies, thus as greater political will) could be more likely to implement a physical activity program, thus characterizing a selection bias [15]. An alternative to minimize the selection bias is the use of the propensity score matching method, which allows the constitution of a comparison group with observable characteristics similar to those of the group exposed to the policy (treated) [15,16]. Consequently, it becomes possible to identify and select at least one unit from the comparison group that represents a counterfactual result for each treated unit.
This scenario, therefore, creates the idea that the only difference between the units evaluated would be the participation or not in the policy, since they have other similar characteristics [16].
The Propensity Score Matching method proposes that after defining the treated and comparison groups, regression models for binary data are estimated using Logit or Probit type link functions [17] to determine the probability of an analyzed unit adhere to the policy, through a vector of characteristics from the period prior to exposure to the program (X i,-1 ), which is given by: Where: Trat i is a dummy variable with a value 1 if the individual is exposed to the policy and a value of 0 for those not exposed; ∅ is an accumulated logistic distribution function, X i,-1 is a vector of k explanatory variables weighted by the inverse of the treatment probability, and β is a vector of parameters associated with these variables.
In order to identify the best matching strategy, the next step of the method is to use the estimated propensity scores to compute weights that allow the individuals in the comparison group to be balanced, so that they become, on average, similar to the treated group. For this purpose, matching algorithms such as nearest neighbor, kernel matching and radial matching are usually used.

The Difference-in-Difference Method
The difference-in-differences estimator is a method used in quasi-experimental approaches to evaluate the impact of interventions, based on the hypothesis of parallel evolution of the results relative to a variable of interest between the treated and comparison groups [18]. This method postulates that the trajectory of the results for the comparison group represents what would have happened with the variable result of the treated in the absence of intervention. The use of this method provides reliable estimates for the causal effect of the treatment, as long as the evolution of unobservable information from the treatment and control groups presents a uniform variation over time [18,19]. The adoption of the difference-in-differences method requires information about the treaties and comparison groups before and after the implementation of the intervention that allows the construction of a scenario that describes the parallel evolution of the trajectory of the treaties and the comparison group over time periods of at least one year before and after the implementation of the program.
Thus, it is possible to capture the treatment effect by calculating the difference-in-differences between results observed before and after treatment [18].
Considering t = 0 as the period before the implementation of the intervention and t = 1 as the period after, the difference-indifferences estimator is described by: Where: Y i = result variable of a municipality treated i; Y j = result variable of a comparison group municipality j.
The difference-in-differences estimator is able to minimize the selection bias associated with unobservable characteristics and is therefore particularly useful to assess the impact of physical activity promotion policies, as it allows controlling problems related to time-invariant characteristics, for example, the innate abilities of individuals or the political will of a public manager to implement a program.

Combination of Public Policy Impact Assessment Methods
Importantly, impact assessments are subject to three types of bias. The first is the result of possible differences in the observable characteristics between treated and comparison groups, such as income, sociodemographic or epidemiological variables, level of training and motor experiences. The second bias may be associated with unobservable characteristics of the studied groups (such as individual motivation, political will, technical capacity of individuals, etc.). The third type of bias occurs due to the impossibility of comparing the groups due to the absence of common standards, that is, due to the absence of overlapping of the conditional density function of the observable characteristics of the treated and control group [11,16]. Biases related to observable and unobservable characteristics and the absence of a common support potentially generate imprecise conclusions regarding the impact of a public policy.
On the other hand, the control of bias in quasi-experimental studies can be performed by combining methods such as propensity score matching and regression of differences in differences, which allow to estimate the causal effect of a policy [19][20][21]. Impact assessment methods can be combined with each other or with other techniques, aiming to overcome possible individual limitations of the methods and increase the robustness of the results. The combination of propensity score matching and difference-in-differences methods, also known as double difference matching, improves the quality of results from non-experimental studies [20], as the difference-in-differences method minimizes possible selection biases by characteristics of the treated and comparison groups, while the matching by propensity score of units (or individuals) with similar characteristics allows to reduce both biases arising from the distribution of observable characteristics, as well as biases related to the absence of common support [21,22].

Examples of Impact Assessment of Physical Activity Programs
Physical