Saturday, April 2, 2011

Biases in the second Asianet-C Fore opinion polls in Kerala

Asianet-C Fore has come up with a controversial set of estimates in their second opinion poll in the run up to the Kerala Assembly elections 2011. There are several problems in the results of the opinion poll, and it appears that many criticisms against the survey are justified.

The results of the SECOND SURVEY show a major change in vote shares and seats from the earlier survey conducted by Asianet-C Fore. On March 9th, Asianet-C Fore published results of the FIRST SURVEY that showed that
  • UDF will win 77-87 seats
  • LDF will win 53-63 seats
  • BJP will win up to 5 seats
Further, the vote shares of each party were the following:
  • UDF: 43 per cent
  • LDF: 39 per cent
  • BJP and others: 18 per cent
The new results of the SECOND SURVEY that were shown on March 31st show a very different picture. This new survey shows that,
  • UDF will win 80-90 seats
  • LDF will win 50-60 seats
  • BJP will win up to 2 seats
Further, the vote shares were the following:
  • UDF: 46 per cent
  • LDF: 41 per cent
  • BJP: 9 per cent
  • Others: 4 per cent
Here, what is interesting is that even while the vote share of the LDF has gone up by 2 per cent, the number of predicted seats for LDF comes down by about 10. Here is where, as we shall argue below, the robustness of the prediction model and the randomness of the sample can be called into serious question.

In any psephology survey, there are different typical steps, each of which could be affected by margins of errors. First, you randomly select a sample set of constituencies for the survey. Here, it is important to know whether the sample of constituencies is randomly selected or purposively selected. Some may argue that if randomly selected, the diversity of electoral regions may not be captured. However, if purposively selected, the “purpose” and its rationale need to be publicised and justified.

Secondly, within these constituencies, you randomly select a set of voters. Here again, two sets of errors can enter. There could be a sampling error, which could be the outcome of the choice of the sampling design itself. There could also be a non-sampling error, which arises from errors in measurement and recording of data.

Thirdly, to compute vote shares, you use a suitable model of voting behaviour using past data in conjunction with the vote shares computed from sample opinion surveys. That is, you have baseline information (from the previous election) with the vote share of each party and who won how many seats. Based on this past information and the results of the present opinion poll, you estimate a “swing” for each party. Using this state-level swing factor will give you the estimate for the state-wide share of votes for each party. You can also compute swings in regions within the state – i.e., for Malabar, central Kerala and south Kerala. Here, of course, the major possible source of error lies in the assumption of uniform swing factors across states or regions.

Fourthly, number of predicted seats is arrived at from these vote shares. It is here that the maximum possibility of error lies. Let me take a common method used, which I borrow from the well-known statistician and psephologist Rajeeva Karandikar. After assuming that the swing factor is constant across a region and the state, you further assume that the swing in one seat is a convex combination of the state-level swing and the region-level swing. A convex combination is a linear combination of points, where all coefficients are non-negative and sum up to 1. Based on this swing factor for each seat, you arrive at winning probabilities of success for each candidate. A sum of probabilities of each UDF/LDF candidate in each seat will give you the total number of seats for UDF/LDF in the state as a whole.

However, most survey agencies never publicise the exact method of transforming vote shares into number of seats. It is always a black box, except in a few rare instances. For example, in 1998, the Frontline-APT Research Group opinion poll in Tamil Nadu publicised the details of its methodology of converting vote shares into number of seats. This survey used a modified version (to suit Indian realities) of the “Cube Law” for this purpose; the Cube Law states that if the vote shares of two parties ‘A’ and ‘B’ are ‘a’ and ‘b’, their seats will be in the ratio a3 to b3. In most cases of opinion polls, the actual method remains unknown. According to Professor Venkatesh Athreya, a close observer of opinion polls,

In our first-past-the-post electoral system, an element of judgment is inevitable in moving from vote share forecasts to seat predictions, since there is no one-to-one correspondence between the two. Factors such as the degree of geographical concentration of a political formation's voter support, the degree of polarity in the electoral contest (whether a contest is bipolar, tripolar or even more multipolar) and the complexity (both on paper and in practice) of the electoral adjustments worked out are all crucial to determining the extent to which an increase in vote share translates into an increase in seats. This particular aspect needs to be borne in mind when evaluating or utilising opinion poll survey results.

In the Asianet-C Fore survey, only sketchy details of the methodology used are given. Asianet-C Fore had done a first survey in the month of February 2011, and the methodological details of that survey are available. I give below the only details that they have provided:

C fore (Centre for Forecasting & Research) conducted the [first] pre-poll survey between 23rd February and 7th March, 2011 in Kerala. In all, 6112 voters were interviewed using a structured questionnaire from 40 assembly constituencies using systematic random sampling method. In each constituency 5 urban and 15 rural locations were selected. For every location, a starting point was selected randomly in north, south, east and west direction. From each starting point right hand rule was followed and one person (above 18 years of age) was interviewed from a household with interval of 10 households. Thus, in all, polling was conducted in 200 urban and 600 rural localities in the state. Care was taken to ensure that different castes and communities were represented in the sample in their actual proportion. The survey has a margin of error of 1 percentage point at 90 percent confidence level.

It is not clear if the same methodology was followed for the second survey also.

Further, there are no details of (a) whether the 40 constituencies were selected randomly or purposely; (b) whether regional considerations were taken into account while selecting the 40 constituencies; (c) how care was taken to ensure that different castes and communities are represented in the sample; and (d) what method was used to convert votes shares into seats. What are also missing are scenarios, which give details of changes in predictions in the face of errors in the prediction of swings.

Given these handicaps, it is extremely difficult to comment on the results of the Asianet-C Fore survey. This is because, as explained before, the robustness of the model and randomness of the sample are critical in the prediction of the number of seats. If the sample is not random (both the sample of constituencies and the sample of voters), results can go haywire. If the prediction model is sensitive to small shifts, results can go bizarre. From what we know of the results of Asianet-C Fore, these errors could be large.

Below, I list some of the serious anomalies in the results. Here, I shall use the post-poll results from two earlier elections in Kerala – 2004 and 2009 – to compare the nature of shifts that the 2011 results throw up. The 2004 and 2009 surveys were conducted as part of the National Election Surveys (NES) of Lokniti in New Delhi. The community-wise shift in votes is extremely surprising (see Tables 1, 2 and 3 below). For instance,

  • Compared to 2004 Parliament elections, the upper caste Hindu vote share of LDF has fallen from 40 per cent to 13 per cent.
  • While the share among Ezhavas has increased for the LDF (from 58 per cent in 2004 to 68 per cent in 2011), the share among Hindu OBCs has fallen from 52 per cent in 2004 to 42 per cent in 2011. How can the share of Ezhava votes for the LDF increase and share of Hindu OBC votes for the LDF fall, both so significantly? For the Hindu OBCs, the vote share for the UDF has increased from 17 per cent in 2004 to 41 per cent in 2011.
  • Among Dalits, the vote share of LDF in 2004 was 72 per cent and in 2009 was 69 per cent, according to NES. However, the Asianet survey shows that it has fallen to 50 per cent in 2011.
  • No survey anywhere in Kerala has shown till now that the vote share among upper-caste Hindus and Syrian Christians is higher for BJP and others than the LDF. The survey is saying that a non-UD/non-LDF combination can net in more votes among Syrian Christians than LDF. This appears to be an extraordinarily mistaken result that defies common sense.

TABLE 1
NATIONAL ELECTION STUDY (NES) RESULTS FOR 2004, KERALA, in per cent

Caste/Religion
Share for
N
LDF
UDF
BJP
Others
Hindu upper caste
40
43
12
5
93
Nairs
42
29
26
4
84
Ezhavas
58
22
18
2
238
OBCs
52
17
27
4
75
Dalits
72
17
8
2
87
Muslims
41
57
1
1
140
Christians
31
62
2
5
214
Source: National Election Study, 2004, weighted data set.

TABLE 2
NATIONAL ELECTION STUDY (NES) RESULTS FOR 2009, KERALA, in per cent

Caste/Religion
Share for LDF
Swing for LDF from 2004
Share for UDF
Swing for UDF from 2004
Nairs
27
-14
33
+4
Ezhavas
57
-1
27
+5
Dalits
69
-5
15
+5
Muslims
26
-3
69
-2
Christians
32
-15
69
+13
Source: National Election Study, 2009, weighted data set.

TABLE 3
ASIANET-C Fore RESULTS OF THE SECOND SURVEY, 2011, KERALA

Caste/Religion
Share (%) for
LDF
UDF
BJP & Others
Hindu upper caste
13
60
27
Ezhavas
68
25
7
Hindu OBCs
40
41
19
Dalits
50
34
16
Syrian Christians
11
77
12
Other Christians
14
73
13
Muslims
23
70
7
Source: Asianet


There is further evidence of the bias of the second Asianet-C Fore survey. Let us compare the results of the first survey and second survey.

  • In the first survey, among Hindu upper caste voters, 65 per cent would vote for UDF, 22 per cent would vote for LDF and 13 per cent would vote for Others (see Table 4 below).
  • However, in the second survey, among Hindu upper caste voters, 60 per cent would vote for UDF, 13 per cent would vote for LDF and 27 per cent would vote for Others (see Table 3 above).

TABLE 4
ASIANET-C Fore RESULTS OF THE FIRST SURVEY, 2011, KERALA

Caste/Religion
Share (%) for
LDF
UDF
BJP & Others
Hindu upper caste
22
65
13
Ezhavas
47
35
18
Hindu OBCs
42
37
21
Dalits
54
31
15
Syrian Christians
14
70
16
Other Christians
22
68
10
Muslims
21
72
7
Source: Asianet

How did such a large share of Hindu upper caste voters decide to vote against both UDF and LDF, and in favour of Others, just over a period of one month? Was there a deliberately induced bias in the second survey?

All these unexplainable trends show that serious questions can be raised regarding the randomness of the sample used in the Asianet-C Fore survey. Further, given that the same survey shows that the candidate’s personal qualities are important in voting patterns and that “political affiliations” are less important, would the assumption of a state-level or region-level swing be appropriate? Is that a safe assumption? It appears no, and this further reaffirms the doubt that the sampling is not quite random as it should be.

Robust psephology, as pioneers in the field would tell you, requires good statistics, lots of common sense and a good understanding of ground realities (“domain knowledge”). The Asianet-C Fore survey appears lacking in all three.

In addition, the results of other opinion polls appear at variance with Asianet-C Fore’s. The Institute for Monitoring Economic Growth (IMEG) has announced its opinion poll results. On the positive side, more details are available from IMEG regarding methodology than Asianet-C Fore:

കഴിഞ്ഞ മൂന്നു പൊതു തിരഞ്ഞെടുപ്പുകളിലും, ഏറണാകുളം ഉപ തെരഞ്ഞെടുപ്പിലും, ഐമഗ് നടത്തിയ വിജയകരമായ് പഠനങ്ങളില് നിന്നും കുറെ കൂടി മെച്ചപ്പെട്ട മേതോഡോളജിയാണ് സര്വെയില് ഉപയോഗിച്ചത്. മൂന്ന് തരം സര്വേകളുടെ pooled result ആണ് അഭിപ്രായ വോട്ടെടുപ്പിന്റെ ഫലം. ആദ്ധ്യത്തേതു, മുന്നണികളോടുള്ള കൂറും, കൂറ് മാറ്റവും അടങ്ങിയ swing സര്വ്വേയും രണ്ടാമത്തേത്, കേന്ദ്ര സംസ്ഥാന ഭരണങ്ങളെ കുറിച്ചുള്ള വോട്ടര്മാരുടെ അഭിപ്രായവും, വോട്ടര്മാര് വായിക്കുന്ന പത്രങ്ങള്, വാര്ത്തകള് കാണുന്ന ടെലിവിഷന് ചാനലുകള്, ഇവയും വോട്ടര്മാരുടെ രാഷ്ട്രീയ ചായവും തമ്മിലുള്ള correlation ഉം, വോട്ടര്മാരുടെ കമ്പ്യൂട്ടര്-ഇന്റര്നെറ്റ് താല്പ്പര്യങ്ങളും അടങ്ങുന്നതുമാണ്. മൂന്നാമത്തെയിനം, നേരിട്ടുള്ള അഭിപ്രായ സര്വെയാണ്. കേരളത്തിലെ 140 മണ്ഡലങ്ങളിലെ തിരഞ്ഞെടുക്കപ്പെട്ട വാര്ഡുകളിലെ റാണ്ടമായി തിരഞ്ഞെടുക്കപ്പെട്ട വീടുകളില് നിന്നും UDF, LDF, BJP, മറ്റു പ്രമുഖ കക്ഷികള്, ഇവരില് ആര്ക്കാണ് വോട്ടു ചെയ്യാന് ഉദ്ദേശിക്കുന്നത് എന്ന് രഹസ്യമായി രേഖപ്പെടുത്താന് സ്ലിപ്പ് നല്കുകയും, പ്രസ്തുത സ്ലിപ്പില് അഭിപ്രായം രേഖപ്പെടുത്തി സീല് ചെയ്തു പ്രത്യേകം തയ്യാറാക്കുന്ന കവറുകളില് നിക്ഷേപിക്കുന്ന രീതിയാണ്, ഇതോടൊപ്പം തന്നെ, matching sample കളില് എല്ലാ നിയോജക മണ്ഡലങ്ങളിലും IMEG faculty അംഗങ്ങള് നേരിട്ട് hit-and-run survey യും നടത്തി. ഇവ മൂന്നും ചേര്ന്നതാണ് സര്വേ ഫലം.

മൂന്നു പഠനങ്ങളിലും കൂടി ആകെ 59,678 പേരെയാണ് സര്വെയില് ഐമഗ് ടീം കണ്ടത്. പഠനങ്ങളിലെ സാംബ്ലിംഗ് എറര് (SE) 2 ശതമാനമാണ് എന്നും നോണ് സാംബ്ലിംഗ് എറര് ഒരു ശതമാനമാണ് എന്നും കണക്കാക്കാം.

Here, the claim is that all 140 constituencies were covered as in a census, and there was no sampling of constituencies. This does appear to be needless effort to “spread resources thin”; as Venkatesh Athreya has noted:

A priori, results from surveys with a significantly larger number of sample constituencies may be regarded as more robust. This is in some ways more important than the size of the sample in terms of the number of respondents per se, provided of course that the latter does not go below a critical minimum figure. It must be noted, however, that increasing the number of sample constituencies beyond a critical minimum size also does not yield much greater precision in vote share estimates.

Yet, given that the sample size of voters is also higher in the IMEG survey, ceteris paribus, it appears to have higher reliability than the Asianet-C-Fore survey.

The IMEG survey results reveal that the UDF may win in 72-82 seats, while the LDF may get 58-68 Assembly seats. The BJP has very little chance to open an account, even though they may improve their position. It also observes that there is very strong contest in 20 assembly segments, where the results can go either way. This finding on 20 seats should have forced IMEG to make it a too-close-to-call prediction. Despite the fact that they have still gone ahead and predicted till the last mile, this result appears to be more realistic than Asianet-C Fore’s, partly because of the larger spread of constituencies (all 140, as compared to 40) and partly because of the larger sample size (59,678 as compared to 6,112).

A third survey result has been reported by an agency called Centre for Electoral Studies (CES), financed fully by Asianet. However, Asianet has been trying to underplay the results of this survey and overplay its predictions with C-Fore. Asianet has even refused to scroll these results in the news bar, though it did allow 9 minutes during its News Hour show for a discussion on this survey. According to Dr Syam Lal, who is attached to CES, the sample size of voters was 3625 from 35 constituencies. 105 respondents were selected from each constituency on the basis of systematic random sampling method from 3 polling stations selected again on systematic random sampling method. Voter preferences were elicited through an actual mock ballot. The 35 constituencies were selected on the basis of probability proportionate to size sampling method.

The CES survey also shows results very different from Asianet-C-Fore’s. According to the CES, the difference in vote shares between the LDF and UDF is extremely narrow; the UDF has a vote share of 44.9 per cent, while the LDF has a vote share of 44.3 per cent. Accordingly the seat predictions move much in favour of the LDF: the LDF would get 64 to 70 seats, while the UDF would get 70 to 76 seats. This represents an extremely close finish, with slight margins of error becoming capable of tilting the balance either way.

On the balance, the results of Asianet-C Fore are at great variance with both the IMEG survey and the CES survey. Protests have already arisen in Kerala against the second Asianet-C Fore survey, alleging that it is politically motivated. One argument is that given Asianet’s BJP connection (its leading shareholder Rajeev Chandrasekhar is close to the BJP), the agency might have over-sampled from constituencies where the BJP has a stronger presence. The results, thus, show a 9 per cent vote share for the BJP in the State, which is highly unrealistic. This could have also led to biased estimates regarding how different communities vote; in regions where the BJP is strong, their major vote-base is the Hindu upper castes, mainly Nairs.

Of course, the final verdict is in the hands of voters. Only time would tell if Asianet-C Fore predictions are correct or not. However, opinion polls do have an influence in deciding voter’s decisions. Hence, it is important that the full methodological details of these surveys are put in the public domain by the agencies concerned. Asianet-C Fore appear less forthcoming in doing so and this does raise doubts.

2 comments:

വര്‍ക്കേഴ്സ് ഫോറം said...

Of course, the final verdict is in the hands of voters. Only time would tell if Asianet-C Fore predictions are correct or not. However, opinion polls do have an influence in deciding voter’s decisions. Hence, it is important that the full methodological details of these surveys are put in the public domain by the agencies concerned. Asianet-C Fore appear less forthcoming in doing so and this does raise doubts.

Rajan Alexander said...

Their only agenda is to make the Lotus Bloom as a sub-plot and give some takeaway for the BJP from this election.

Asianet will end up with mud on their face.

http://exitopinionpollsindia.blogspot.com/2011/04/how-politically-manipulated-is-asianet.html