Small Relatively Absolute Data Analysis.

I have come across a very interested article on Newslaundry titled – “Think India has become more communal under Modi? The numbers will disappoint you.” Given that the blog has been written by an esteemed socio-politico-economic-data analysist and a renowned unbiased journalist, I cannot criticize it. Anyway, how can I criticize any article now, given that I didn’t create a blog and do the same in 1984? I have seen people take big “U-turns” about the political opinions but is it possible to take such “U-turns” with data. I don’t know.

So I just want to understand and learn some of the concepts. Only that much.

If there is one thing that we can trust the State on, it is their passion to duly register an FIR on every communal incident, where the state itself is either complicit or worse the aggressor.  Of course, given that the State we are talking about is UP, there should be no doubts about anyone’s sincerity and we can completely trust the data (registering the FIR and NCRB compilation). Hence, the first few paras of this article are very clear to me.

The problem starts from here.  I have never seen such a discrete information like States joined by “lines” (which normally represent continuity between data points). What does the line joining Chandigarh and Goa indicate? Of course, it is just a technical point, but I want to make sure that I am not missing any grand theory here.



image source :

Thus it was asserted that communal violence increased by nearly 25% in the first five months of 2015 under the Modi government as against the first five months of 2014 during the last days of the UPA. The actual raw numbers are 287 incidents in January-May 2015 as against 232 incidents in the corresponding period 2014.

Leaving aside the fact there’s a relatively small absolute difference between the two numbers and there’s no way to know if the difference is statistically significant

This paragraph is amazing. What exactly is “relatively small absolute difference”? I also want to learn how to calculate such relatively small absolute differences.

There seems to a big dilemma about whether the numbers are “statistically significant”.  The statistics that I had learned (and still learning) seems to suggest that the data is “statistically very significant”. Statistical significance comes into the picture if we are analyzing a small sample and if we want to check if the relationship exhibited by the sample is just because of some sampling or other issues. Why should we bother about “statistical significance” when the data here refers to that of the entire population?

I know that because of 1984 and all that………. But one last question.

To take an extreme case, suppose we compare March 15 across any two years and find that there was one incident last year and two incidents this year on that date. Would we then be entitled to assert a 100% increase in communal violence on that day? Of course, not. 

If there was ONE communal incident last year and if there are TWO communal incidents this year, I would think that there is a 100% increase and I thought that is simple mathematics.  “Of course, not.” I must be very wrong. Like Alzebra there must be some Cow-zebra that will explain why it is “ofcourse not” 100%. Any tutorial on that will be very helpful.







Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s