IGF 2021 Town Hall #42 Building the wiki-way for low-resource languages

Time
Wednesday, 8th December, 2021 (14:05 UTC) - Wednesday, 8th December, 2021 (15:05 UTC)
Room
Conference Room 7
Issue(s)

Economic and social inclusion and sustainable development: What is the relationship between digital policy and development and the established international frameworks for social and economic inclusion set out in the Sustainable Development Goals and the Universal Declaration of Human Rights, and in treaties such as the International Covenant on Economic, Social and Cultural Rights, the Conventions on the Elimination of Discrimination against Women, on the Rights of the Child, and on the Rights of Persons with Disabilities? How do policy makers and other stakeholders effectively connect these global instruments and interpretations to national contexts?
Inclusion, rights and stakeholder roles and responsibilities: What are/should be the responsibilities of governments, businesses, the technical community, civil society, the academic and research sector and community-based actors with regard to digital inclusion and respect for human rights, and what is needed for them to fulfil these in an efficient and effective manner?

Tutorial - Auditorium - 30 Min

Description

Even though the Universal Declaration of Human Rights emphasizes has been emphasizing equitable access to information and participation in the knowledge commons for all, the real world implementation of development programmes is far from being equitable to low-resource languages. Many indigenous, endangered and other marginalized language speakers around the world face a significant challenge to have agency in the governance of the internet. What does Wikipedia, that was born entirely by merging citizen science and federated knowledge sharing, has to teach the larger internet governance ecosystem? Wikipedia and the projects surrounding it (known as Wikimedia projects) have also created barriers to both access and participation because of the higher participation from the developing countries and extremely lower participation from the developing countries. How such learning help shape in creating low entry-level barriers to the internet? This session will capture the imagination for rebuilding the knowledge commons where low-resource language speakers can contribute, particularly orally, without having to worry about an "official" writing system and establish their agency without struggling with an "authority control".

Currently in creation, the Abstract Wikipedia is reflective of the lessons learnt collectively in the 20 year lifetime of the Wikipedia movement. At the foundation of Abstract Wikipedia lies Wikidata, one of the largest databases of the world and arguably the largest crowdsourced database created so far almost entirely by volunteers. Though low-resource communities are able to easily contribute in expanding the scope of local knowledge on the internet, the larger issue of representing the majority knowledge (oral history) in a minority medium (written) remains the same. The question the civil society and other stakeholders really need to ask is how the current model of the internet be broken in a radical manner to make space for federated and distributed knowledge sharing. This session will be a response from the civil society to identify the wiki-ways many indigenous and endangered language speakers, and other marginalised groups have in reserve thanks to years of social practices in complementing decision-making.

Inclusion is in the centre of this proposal and we would ensure that anyone, irrespective of their timezone, are able to access the content of this session in perpetuity. We will ensure of accessibility , multilingual summaries post session, and collaborate with communities in our network to localize the key outcomes of the session. Even though our current submission includes only three participants, we are working with other potential panelists and collaborators. Our work in an non-English and non-other dominant language space makes is extremely challenging and we are working on finding ways as we submit this proposal.

Organizers

O Foundation and Rising Voices

Speakers
  1. Eddie Avila is the director of Rising Voices, an initiative of the citizen media organization Global Voices, and has been one of the key advocates of language digital activism for the protection and fostering growth of low-resource languages.
  2. Amrit Sufi is the coordinator of the Oral Culture Transcription project, a toolkit that will enable people to access information on uploading media of endangered languages, and has worked in the past for documenting folk songs in the Angika language of India.
  3. Sardana Ivanova is a doctoral student in Computer Science at the University of Helsinki. She is interested in low-resource languages. She conducts research and works on the development of various language technology tools for the support of the Sakha language, mostly spoken in Russia’s Far East.
  4. Mahir Morshed is a doctoral student researching articulatory features and prosodic unit discovery in speech processing. As a Wikimedian, he has recently been contributing to Wikidata's lexicography and is examining ways to make it usable for text generation.
Online Moderator

Subhashish Panigrahi, O Foundation

Rapporteur

Sailesh Patnaik, O Foundation

SDGs

17.16
17.6

Targets: 17.6: All three key organising groups/communities have an international collaboration, especially a North-South and South-South cooperation. Access to knowledge and building equitable and inclusive guidelines and methodologies for furthering the growth of community participation has been a primary focus of these organisations. 17.16: We all have been contributors to the Wikipedia movement for many years that focuses on a community-driven global partnership for sustainable knowledge sharing. Similarly, the OpenSpeaks project that was built at the O Foundation has been an avenue for educating archivists document indigenous and endangered languages in audio-visual mediums. The Rising Voices is a large network of language digital activists and was recently awarded a UNESCO grant for developing an online toolkit to address the growing challenges to low-resource language communities, particularly the emerging ones during COVID-19.

Key Takeaways (* deadline 2 hours after session)

Stakeholders must work collaboratively for supporting low-resource language communities with addressing issues around accessibility and with removing entry-level barriers of platforms.

Language technology developers and other stakeholders who are not native speakers must work closely with native speakers to implement the development of language technology based on the advice of the latter.

Call to Action (* deadline 2 hours after session)

Creating spaces for peer learning exchange can be a very powerful tool for many low-resource languages to protect and grow use of languages, and stakeholders must emphasize on creation of such spaces.

Stakeholders who must support the creation of Open Educational Resources (OER) for new contributors/potential contributors who are speakers of low-resource languages to remove entry-level barriers to Open and collaborative platforms such as Wikipedia.

Session Report (* deadline 26 October) - click on the ? symbol for instructions

Panelists

  1. Eddie Avila, Director of Rising Voices, an initiative of the citizen media organization Global Voices, and has been one of the key advocates of language digital activism for the protection and fostering growth of low-resource languages.
  2. Amrit Sufi, Coordinator of the Oral Culture Transcription project, a toolkit that will enable people to access information on uploading media of endangered languages, and has worked in the past for documenting folk songs in the Angika language of India.
  3. Sardana Ivanova, Doctoral student in Computer Science at the University of Helsinki. She is interested in low-resource languages. She conducts research and works on the development of various language technology tools for the support of the Sakha language, mostly spoken in Russia’s Far East.
  4. Mahir Morshed, Doctoral student researching articulatory features and prosodic unit discovery in speech processing. As a Wikimedian, he has recently been contributing to Wikidata's lexicography and is examining ways to make it usable for text generation.

Moderator

  • Subhashish Panigrahi, Co-founder and Director, O Foundation. Subhashish is a filmmaker, community organizer and cross-disciplinary researcher.

Rapporteur

  • Sailesh Patnaik, Co-founder and Director, O Foundation. Sailesh is a Wikimedian with experience in open knowledge institutional liaisoning in the government and educational spaces.

Abstract

When it comes to the internet governance, most indigenous, endangered and other low-resource and marginalized language speakers around the world face a significant challenge both in terms of amplifying their issues through participation and their languages getting benefitted in that process. As Whose Knowledge? underlines, a mere 7% of the 6,500 - 7,000 languages, that are spoken around the world, are captured in published material. (Vrana et al., 2020) In this Internet Governance Forum 2021 panel titled "Building the wiki-way for low-resource languages", the representatives engaged primarily around the strategies in the language digital activism and open knowledge platforms, and shared recommendations to grow and sustain the low-resource languages on digital sphere.

Key issue areas

In this panel, organized by the O Foundation in collaboration with Rising Voices, the panelists shared the response of the civil society, particularly in three broader and intersecting sectors, to address the larger and systemic issue of low availability of resources of many marginalized languages and low participation of the native speakers:

  1. Language digital activism, a loosely defined term that has gained popularity in the last decade and includes all kinds of activism on the online spaces for the protection and growth of languages
  2. Volunteer-led movements such as the open knowledge movement, of which Wikipedia is a part of
  3. Academic and other research initiatives that focus on building technology for the overall growth of languages, especially, low-resource languages

Key questions

Some of the key questions that the panelists addressed around existing community strategies, processes and platforms included:

  • How language digital activism is helping active engagement of stakeholders in the low-resource language domain?
  • How other open and collaborative processes and platforms are helping low-resource languages?
  • How creation of computational linguistics tools is making a long-term shift in access to information in an equitable and decentralized manner, especially in the context of many low-resource languages?
  • How Wikipedia and the Wikimedia projects in general, and particularly, the ongoing Wikidata initiatives, are helping low-resourced language speakers reclaim their space on the internet?

Additionally, the questions around specific movements and platforms included:

  • How language digital activism is already and is aimed at moving the needle around furthering access to linguistic rights and access to knowledge?
  • What is envisioned by many activists for the ongoing initiatives to make a long-term shift as a direct or indirect result of these initiatives?
  • What are the learning and recommendations for different stakeholders working on low-resourced languages based on the work around creation of language tools?
  • What are the building blocks and the low entry-level barriers in the Wikimedia world for native speaker communities and other stakeholders of low-resourced languages communities?

Background

To set the context on language digital activism, Eddie Avila, Director of Rising Voices shared how the scarcity of key resources, such as writing systems in the context of oral-only languages or consensus on existing writing systems, access to the internet, consensus on technical terms within a speaker community, and even political implications, continue to remain the major of the barriers behind language digital activism.

Amrit Sufi, native speaker of Angika (endangered language from India) and co-facilitator of the recent Rising Voices organized "Language Digital Activism Workshops for India" (Language Digital Activism Workshops for India · Rising Voices, 2021) series emphasized on the fact that the medium of instruction in her schools were Hindi and English directly impacting the eventual slowdown of her native language Angika to a final cessation. She identified the sense of elitism, associated with not speaking native languages at home environments which are often enforced by parents, to be a primary reason of disappearance of many languages at homes.

Sardana Ivanova, a speaker of the Sakha language and a doctoral student in Computer Science at the University of Helsinki, demonstrated from her experience how collective volunteer efforts for creating content -- through Wikipedia -- eventually helps Natural Language Processing (NLP) researchers to build tools to better use languages on digital platforms.

Mahir Morshed, a Wikipedia/Wikidata contributor and a doctoral student researching articulatory features and prosodic unit discovery in speech processing, shared inputs to the call to action of this session based on the recent development around Wikidata's lexicography as these tools are making ways to for text generation across languages. (Morshed, 2021)

Key takeaways

  1. Stakeholders must work collaboratively for supporting low-resource language communities with addressing issues around accessibility and with removing entry-level barriers of platforms.
  2. Language technology developers and other stakeholders who are not native speakers must work closely with native speakers to implement the development of language technology based on the advice of the latter.

Some of the panelists directly addressed the questions around the aforementioned takeaways:

  • We're creating a space for peer learning for language digital activism so that activists can expand their work through such long-term partnerships and share their work with the larger community. (Eddie Avila)
  • Many low-resourced language digital activists are currently attempting at "normalizing" the use of their language as their languages are not in active use in public discourses. (Amrit Sufi)
  • Language technology developers who might not be native speakers must work closely with native speakers. (Sardana Ivanova)

Call to Action

  1. Creating spaces for peer learning exchange can be a very powerful tool for many low-resource languages to protect and grow use of languages, and stakeholders must emphasize on creation of such spaces.
  2. Stakeholders who must support the creation of Open Educational Resources (OER) for new contributors/potential contributors who are speakers of low-resource languages to remove entry-level barriers to Open and collaborative platforms such as Wikipedia.

Some of the panel inputs/recommendations from the panelists that helped arrive at the aforementioned call to action include:

  • There have been many initiatives recently from the Wikimedia Foundation to improve accessibility on Wikipedia and Wikimedia projects -- this has helped with the growth of many low-resource languages. (Mahir Morshed)
  • Creating translation of Wikidata descriptions is one of the easier ways for newbies to contribute in their low-resource language. (Mahir Morshed)
  • Oral culture documentation as audio and video helps grow visibility for many low-resource languages. (Amrit Sufi)
  • Creating spaces for peer learning exchange can be a very powerful tool for many low-resource languages to protect and grow use of languages. (Eddie Avila)

Avila also emphasized that whether or not language digital activism is moving the needle around furthering access to linguistic rights and access to knowledge is a hard thing to measure. While there are anecdotal evidences to support the impact, there are no clearly defined processes in most cases as each community contributes in their own volunteer spaces and are primarily driven by passion. To respect their volunteerism, attempts to measure the impact of their work are not often prioritized.

References

Vrana A, Sengupta A, Pozo C and Bouterse S (2020). Decolonizing the Internet’s Languages – Summary Report. Whose Knowledge?, 20. (accessed 19 December 2021).

Language Digital Activism Workshops for India · Rising Voices (2021). Rising Voices. Available at https://rising.globalvoices.org/language-digital-activism-workshops-for-india/ (accessed 19 December 2021).

Morshed M (2021). Preparing languages for natural language generation using Wikidata lexicographical data. Septentrio Conference Series. (3):.

Report compiled by

  • Subhashish Panigrahi, O Foundation
  • Sailesh Patnaik, O Foundation

Date of publication

19 December 2021

Copyright

2021. Subhashish Panigrahi and Sailesh Patnaik. O Foundation. CC-BY-SA 4.0.

Cite as

Panigrahi S and Patnaik S (2021). Building the wiki-way for low-resource languages: Session Report. O Foundation. Bhubaneswar. DOI: http://dx.doi.org/10.17613/df9y-nz19 (accessed 19 December 2021).

Note

This report is also available as a standalone report for downloading on Humanities Commons.