IGF 2022 Launch / Award Event #63 Data justice India report: Book launch

Tuesday, 29th November, 2022 (06:00 UTC) - Tuesday, 29th November, 2022 (07:00 UTC)
Large Briefing Room

Digital Empowerment Foundation
Ananthu RA & Jenny Sulfath - Researchers, Onsite Moderator : Osama Manzar - Founder Director Digital Empowerment Foundation Online moderator: Jenny Sulfath Rapporteur: Vineetha Venugopal



Onsite Moderator

Osama Manzar

Online Moderator

Jenny Sulfath


Vineetha Venugopal


10. Reduced Inequalities
11. Sustainable Cities and Communities
16. Peace, Justice and Strong Institutions

Targets: Our book intends to map inequalities that arise from algorithms and AI enabled systems, and trace the gaps in policy and practice of recognizing and navigating these inequalities. In an increasingly data centered world, societies and 'smart cities', the SDGs corresponding to our project are 10, 11 and 16: to reduce inequalities, make cities and societies sustainable, inclusive and just.



Duration (minutes)

As Policy Pilot Partners of the Alan Turing Institute and the Global Partnership on AI, DEF undertook a research program, Advancing Data Justice Research and Practice, aimed at broadening the discourse around data justices to see how it impinges with problems of exclusion and social justice, and to incorporate voices from the global south while doing so. The report contains interviews and interactions we had with several stakeholders from developers, policymakers and impacted communities. The report and the interactions are edited into a book on AI and data issues in India for people to join the discourse, and would be released at IGF.

Being a book launch, most of the conversation could be held via online conferencing in case all our organizers couldn't attend. One onsite moderator would take questions and answer them, and the rest of the team would do the same via zoom/other video conferencing tools

Key Takeaways (* deadline 2 hours after session)

Automation and datafication without taking into account ground realities, local contexts and systemic injustices further marginalise the already vulnerable. For example, homeless populations, transgender persons and other gender/ sexual minorities, religious minorities, nomadic populations and those lacking in textual or digital literacy. Due to multiple marginalities, data injustice has a cumulative impact on them often resulting in multiple lev

Call to Action (* deadline 2 hours after session)

To counter data injustice, data policies must be grounded on internet universality. As such, they must be open, accessible, right-based, citizen-centric and formulated through multi-stakeholder consultations be it at national or international levels. The first step in reforming policies should be consultations with marginalised/ excluded groups. Then there should be national-level audits of the data sector. Ethics companies ensuring diversity i

Session Report (* deadline 26 October) - click on the ? symbol for instructions


Award event 63: Data Justice Report Launch:

Discussants: Ananthu RA & Jenny Sulfath, Onsite Moderator : Tuisha Sircar, Online moderator: Osama Manzar, Rapporteur: Vineetha Venugopal. Chief guest: Dorothy Gordon,  Chair of the UNESCO Information For All Programme and Board member of the UNESCO Institute for Information Technologies in Education.

Introductory remarks ( Tuisha): Data injustice poses challenges to holistic empowerment. Need a holistic approach to access, agency and empowerment

Dorothy: We all extensively know the role that universal access to information and knowledge plays towards engendering sustainable development by fostering equitable societies. However, today we recognise that the policy approach towards Internet Universality should be human Rights-based, Open, Accessible to all and nurtured by Multi-stakeholder participation (R.O.A.M). Employing this approach requires deep diving into the nuanced ways in which data is being designed, used and implicated in complex ways that may hinder the development of equitable societies and here, data justice plays an important role towards recognising these pitfalls in how data becomes a tool of reinstating injustice. By analysing these nuances we may collectively be able to reimagine an alternative for a sustainable future. In that regard, I am congratulating DEF on the release of their report on Data Justice in India and hope it leads to better in-depth perspectives on Internet Universality which is Rights-based, open, accessible to all and deeply nurtured by Multi-stakeholder participation. I hope this engagement translates into more inclusive and citizen-centred data norms in the times to come.


The Assam National Citizenship Registry or NRC, is one case study discussed in the report. It is not only a concern for data justice but a concern for human rights and statelessness. The Assam National Registry exercise was conducted twice, once in 1951 and once in 2018. And to be recognised as a citizen under the National Citizenship Registry, one has to show that their ancestors were part of the 1951 NRC or on a voters list from 1961-1971. I will give a brief overview of the history of Bengali immigrants in Assam to contextualise it. This contextualisation is essential for the data justice project because of how power and identity operate in the collection of data, and a system powered by exclusionary data is relevant to this discussion. The history of Assam NRC can be traced back to the 19th C British rule in Assam, where a large number of people from the Bengali-speaking East Bengal were brought to Assamese-speaking Assam to grow more food, as workers. East Bengal later became part of East Pakistan and now Bangladesh. A majority of the labourers brought were Muslims. A significant amount of Muslims was also settled in the river islands of Assam where floods are common. They are a shifting population. When the government came up with a decision to update the National Registry list, WIPRO was subcontracted to collect and sort data. They deployed a Document Segregation and Meta Data entry software to digitise data, and a family tree algorithm to match the data. 

As I discussed earlier, the first exclusion set happens at the entry level itself. In many places, the first NRC was not conducted. In cases where they had a document as proof of their enrollment, the data was lost from the government. The first layer of exclusion happens here. Secondly, the family tree algorithm demanded that all the data: this includes names, spellings, the addresses, of everyone who is claiming legacy from a single person needs to be matched to a level of exactness. The system rejected the application if it did not match. This means all your cousins and nephews and everyone. Our respondent reported that people had to travel to far-away villages to get this exact information and track people. The islands also have the lowest literacy rate. This also meant people who are estranged from their families did not have any method to get the details that were required. Further, people who were not born out of legitimate marriages were also further excluded because they did not have the ancestry to trace back to. 

The NRC is also connected to two other systems of exclusion. The D voter’s list- the doubtful voter's list and the Assam border police. Anyone who is suspected to be an illegal immigrant was put on a D voters list- the Assam police also have a list of people who are suspected to be immigrants, thus a data set that is exclusionary. This compounds the exclusion of more than three lakh people. The Assam border police are mainly deployed on the river banks and river islands. Even when they submitted the required document, people marked by these two systems were excluded from Citizenship. 

Homelessness: Another set population who keeps moving from one place to another is the homeless and the circular migrants. India has a Unique ID- Aadhar through which most of the welfare policies are accessible to people. While homeless people can get an Aadhar card based on the shelter homes, the majority of them do not have a mobile phone with which their Aadhar is registered. Another issue is that, even though Aadhar is not mandatory for hospital admissions, to register death and birth, one needs an Aadhar card. An example is TB patients. India has the highest number of TB patients, and the rate is high among the homeless population. The majority of them are refused admission to a hospital without an Aadhar card. Further, for TB patients to get an allowance for their recovery, around 10 dollars a month, they need to have a bank account. For a bank account, one needs address proof. The issue of the homeless population connects to the question of representation in data justice. Who is counted, and who is excluded? This is an important question because the AI and ML-based systems are built on existing data sets and the homeless is not counted. A system built on these existing exclusions will further marginalise and criminalise them. 

Ananthu: The research team at Alan Turing Institute has over the course of their work, tried to provide a broader frame to the idea of data justice, and not limit it to the problems of privacy or security (which are real problems), and attempt to fill this gap in both research and practice. Most of our research was guided by their work- They had identified six pillars of data justice, which are not separate and mutually exclusive of each other, but have overlapped.

  • Six identified pillars, in brief:
    • Power: Understanding (and combatting) existing, deep-rooted patterns of dominance and power structures
    • Equity: Seeing the long-term patterns of inequality and transforming it to be closer to social justice
    • Access: to data, resources, and innovation
    • Identity: critiquing erasure and othering, exposing binaries.
    • Participation: Meaningful representation and inclusion of diverse populations and views
    • Knowledge: not limited to a dominant version, but epistemically pluralist and acknowledging the variety of it.

They have worked with several organisations from the global south to expand their scope, and understand the issues such that they are not limited to the current way it is looked at in Europe or other parts of the global north. We have used these pillars to try and look at some of the cases we spotted in India, either involving data and AI-related tools or at narratives of exclusions and invisibilization driven by data. We talked to communities affected, and the developers and policy people who sought to implement these. We had conversations with them over interviews and workshops (mostly conducted digitally during the third wave of the pandemic). Policymakers tried to defend their choices, and explain how the moves were for larger benefits, developers talked about the issues they faced while working on these tools, and we listened to the narratives of discrimination, erasure, and bias that the communities faced as some of these were rolled out. 

Our conversation with developers had some interesting, mixed results. To put this briefly, there are several really innovative attempts at using AI for social interventions, like the automated pest detection system in Andhra Pradesh which had helped farmers save crops, for example. However, most developers are unaware and not trained in the social impact - or the relation these technologies share with society. Some felt it was not needed for non-experts (the public) to interfere in technical processes. However many do think otherwise. They understand how the community needs to be engaged to figure out where the problem is. This was seen in one TB detection software, where the issue was not really with the use of AI for detecting TB, but with welfare policies that prevent patients from recovering from the disease- due to lack of resources for nutritional food, or even to continue getting access to medicines. The examination of these workflows requires a lens of understanding the underlying social issues and not just a technical solution to be implemented.

What we gathered from the conversations with developers, is an issue of representation(the participation pillar). A problem of the demographic composition of the workforce and decision-makers remains, which needs to be solved, sometimes institutionally. An ethics committee that can ensure this diversity is one way of tackling exclusion. Similar to ethics committees, the courses on software development, (not just AI/ML) should ideally include a component of social sciences, particularly data justice. This means a curriculum-level change is needed to make the development team aware of the socio-political impacts of their work, and how their work can help challenge existing structures of power, and create a more equitable world.

Osama: Aadhar was brought for data justice and social empowerment. Digitisation can offer lots of efficiency in welfare distribution. But at the same time, the purpose of digitisation - the idea is that each and every person has to benefit. We need a Citizen centric data policy. Even if one person falls behind, it is not a data policy. In India, we have given more importance to governance efficiency than citizen services

  • A pregnant lady lost  her child  due to not having Aadhaar
  • Weaonisaton of Aadhar -Bureaucracy starts using it as a means of exclusion, subjugation
  • Each and everything through data, digital 
  • Whether your citizen has access to data: We don't 
  • It is very important that we upgrade policies keeping in mind citizens - people who are not connected - how do you serve them with digital efficiency - but without violating their privacy 


  • National-level audits of the data sector
  • Consultations with groups that are excluded - the first step 
  • Representation or participation pillar to be strengthened: Ethics committee ensuring diversity
  • Courses on software development should cover components of data justice