Methodology

Statistically, our challenges in compiling and constructing the Ibrahim Index of African Governance were many, including choosing the most appropriate statistical method to aggregate the data into one composite index, and at a more basic level, finding the most “suitable” set of indicators that appropriately reflect governance as defined by the Board of the Foundation, its Founder, and its Advisory Council and Technical Committee members. Essentially, the Ibrahim Index considers governance from the point of view of the citizen. It measures the extent of delivery to the citizen of a large number of economic, social and political goods and services by governments and non-state actors. The Index groups indicators into four main categories: Safety and the Rule of Law, Participation and Human Rights, Sustainable Economic Opportunity, and Human Development. The indicators in each category are proxies for the quality of the outcomes and outputs of governance.

The 2009 Ibrahim Index of African Governance is a work in progress, and also builds on the work of the first two years. We put together a number of official data indicators that capture governance in the way that the Foundation and its partners perceive it, and in addition, this year, largely as a result of direct input from our African partners and interlocutors, we have introduced a considerable number of qualitative indicators (expert assessment). We believe that these add substantial value and nuance in capturing aspects of governance, particularly in relation to assessing recent governance performance.

This is especially the case given the highly lagged nature of a large number of official data, some of which had to be excluded because it reflected little of a government’s recent performance.  However, while the worst of those legacy indicators were excluded, some indicators with a more limited legacy effect are still included. 

At the practical level, we found that many official data indicators that we would have liked to include did not have sufficient data coverage and were not released or updated periodically enough, to warrant inclusion. This factor led us to exclude what could arguably be considered the most important indicators on governance: poverty indicators.

On another front, and similar to what was done previously, for many indicators that we include the data were missing for many periods during 2000-2008, particularly in the earlier years. This meant that the missing values had to be estimated. In most cases, we substituted using the country means (or extrapolation) for that variable where appropriate. However, in a few instances, we did not explicitly estimate missing data and the Index was computed by averaging all indicators – with and without explicit estimates for missing values. An example where no explicit estimation for the data was done is the indicator for “Internally Displaced Persons (IDPs)”. Where data were available, the number was taken for that year. But it is clearly not appropriate to take an “average” of IDPs for a year in which data were not available. That year may have been free of IDPs. Hence, we left that country data-year empty.

In some instances, we found some important and very useful proxies of governance in a particular dimension (such as maternal mortality in the Human Development category), but we could only find data for one year, and so we had to exclude that indicator. When another year (or more) of data becomes available, it is likely that we will include that indicator in the Index.

The Ibrahim Index is a composite, and as such, it could be seen as a “poll of polls”, utilizing data from 21 external institutions. Following the gathering of the raw data on all the indicators, a method was chosen to put the raw data on to a common scale, that is to say, to re-scale or normalise the raw data, so that it can be usefully combined to produce an overall score for each country.  There are a number of statistical methods and data aggregation techniques to choose from. The Index uses the same method as in the past, namely, the “min-max” method. However, this year, a statistical technique was used to address the outliers, given the high degree of sensitivity of the min-max method to outliers.

Fundamentally, the min-max method involves re-scaling the raw data values to a scale of 0-100, for every indicator, for every country, and for every year. This re-scaling is done using the formula:

[xt – Min(X)]/[Max (X) – Min (X)]*100

Where xt is the raw value for that indicator for a particular country in year t, and the Min(X) and Max (X) are the minimum and maximum values for that indicator over the whole period and for all countries. The final result was subtracted from 100 where necessary, so that a higher number always indicated a better performance.

The following indicators were filtered to fix for outliers:

Battle Deaths, Civilian Deaths, Refugees, IDPs, Budget Surplus or Deficit as a Percentage of GDP, External Debt Service Ratio, Import Cover, Dealing With Licenses: time taken to complete each procedure, Dealing With Licenses: cost of each procedure,  Starting a Business, and Inflation. All these indicators were fixed according to the following scheme:

  1. The trimmed mean and trimmed standard deviation of the variable were computed. Specifically, the mean and standard deviation were computed on the central 95% of the distribution (i.e., the bottom and top 2.5% were not used to compute mean and standard deviation).

  2. All observations that lie more than 2.5 trimmed standard deviations away from the trimmed mean are replaced with trimmed mean + 2.6 trimmed standard deviation if they are in the right tail, and replaced with trimmed mean -2.6 trimmed standard deviations if they are in the left tail. The trimmed moments are computed on the central 95% of the distribution (i.e., removing bottom 2.5% and top 2.5%).

The data for inflation was severely skewed by the extreme Zimbabwe data of later years, and this severity of the outliers meant that the indicator needed to be altered twice (this needed different parameters because of the very long right tail): first, all observations that were more than 5 trimmed standard deviations above the trimmed mean were replaced with: trimmed mean + 5.2 trimmed standard deviations. In cases of negative inflation, the relevant observations are replaced with 0.
Once those variables that required filtering for outliers were filtered, the variables used to compile the Index were each rescaled as per the normalisation formula above.

Some indicators are composed of clusters of variables. This clustering was done when it was found that several sources appear to be measuring very similar dimensions (such as Property Rights or Press Freedom). To avoid double counting and confusion, once the raw values of the component variables were normalised using the min-max method, we average the scores of the component variables to arrive at the overall score for that clustered indicator. For example, the indicator “Press Freedom” is composed of three variables, each coming from a different source measuring the extent of the freedom of the press in each country. Therefore, given the extreme similarity of those variables, after each raw data set was normalised, we took the simple average of the three re-scaled variables and entered this average into the Index as one indicator called “Press Freedom”.

The sub-category scores were calculated by averaging the scores of all the component indicators. Category scores were calculated by averaging the scores of the sub-categories, and finally, the overall Index scores were obtained by simply averaging the scores of the four categories (Safety and Rule of Law, Participation and Human Rights, Sustainable Economic Opportunity and Human Development).*

The methods used to compile the Index and the nature of the data mean that, for cross country comparisons and comparisons over time, it is more instructive to look at scores and ranks in more recent years, rather than in the early years. A key reason for this is that data in the early years are patchy but data availability improves substantially over time. Comparisons of scores across sub-categories and categories are misleading and so should be avoided. Moreover, comparisons across countries (for the same period) should be governed by the non-trivial and considerable margins of error, which are present in any Governance Index or indicator.
The main sources of uncertainty in the computation of the Index arise from measurement errors and missing data.** Standard errors and confidence intervals that capture the uncertainty that arises from missing value imputation were computed via a simulation exercise that approximates the formal multiple imputation approach (Rubin,1987).*** The standard errors we computed allow users of the Index to discriminate, to a certain degree, between changes in the value of the Index that can be confidently treated as actual changes in the state of the governance as we define it, and changes that might be due to “noise”, or are at least insufficiently sizeable to be able to ascribe a high likelihood to such change being significant.

Those margins of error mean that score or rank comparisons when differences across countries are small should be avoided, since they would reflect a statistical “tie”. On average, we found that the margin of error for the overall Index scores was around +/- 7 points.  Those margins of error rise substantially in some instances when we calculate them at the sub-category level. More broadly, governance is not a phenomenon that changes quickly over short periods of time, unless a large shock occurs—such as a war or a coup d’état.

In the short-medium term, we face some other challenges. Some of those are—as discussed above—data-centered, particularly in the Human Development category and more specifically, relating to the problem of getting more comprehensive and less patchy datasets on poverty. Another type of data which we hope to incorporate when sufficient country coverage becomes available in the relevant dimensions, are citizen (and entrepreneur) survey data.

Finally, in the 2009 edition of the Ibrahim Index, in order to render the Index more reflective of recent performance, we decided to use the latest available data for every indicator where it was available. This means that for the year marked 2007/2008, for example, 2008 data were used if available, and 2007 if not.

* While this process of compiling the Index may be done adequately using standard spreadsheet software, for the sake of accuracy and precision we used a programming language called “R”—a system for statistical computation and graphics (this is freely available here).
** It could also be argued that the weights applied to each category—all being equally weighted—also generate a degree of uncertainty in the scores.
*** Standard errors and confidence intervals were computed using a statistical technique called “bootstrapping”.

Index Indicators

Index Sources