WikiJournal Preprints/OpenSpeaks: Open Toolkit for Multimedia Documentation of Indigenous Languages

WikiJournal Preprints
Open access • Publication charge free • Public peer review

This article is an unpublished pre-print undergoing public peer review organised by the WikiJournal of Humanities.

You can follow its progress through the peer review process at this tracking page.

First submitted: 5 May 2021
Reviewer comments

Suggested (provisional) preprint citation format:
Subhashish Panigrahi. "OpenSpeaks: Open Toolkit for Multimedia Documentation of Indigenous Languages". WikiJournal Preprints. Wikidata Q106806074.

License: This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction, provided the original author and source are credited.

Editors:

Eystein Thanisch contact

Reviewers: (comments)

Sim Tze Wei
T. B. Dinesh

Article information

Abstract

Language field documentation experts have often expressed concerns about unethical and colonial practices that have led to poor "data ownership" by native speakers. I argue that the citizen model of language documentation, especially when done by non-native speakers, must focus on ensuring the agency of the native speakers as a design principle in their practice. This paper discusses how this could be manifested through long-term collaboration with the language community, documentation together with the community, and sharing of openly-licensing archived materials instead of paywalling the same, leading to creating community ownership over the archiving process and archived materials. OpenSpeaks, a toolkit of open educational resources that I created based on learning from citizen-led language documentation and archiving processes, has been used here as a testing ground for developing a learning framework and resources for documenters and archivists who focus on low-resource languages. OpenSpeaks was designed keeping indigenous, endangered, and other low-resource languages in mind, particularly those lacking audiovisual documentation. While highlighting the lack of socio-economic and the resulting resources in this sector, this resource is critiqued based on what it offers to the citizen language documenters and archivists, and the gaps, from an anthropological and social justice perspective.

Introduction

The rapid decline of indigenous, endangered, and other low-resource languages has been a growing concern for civil society due to its impact on human knowledge. These marginalised languages, often on the verge of extinction, do not often receive linguistic research and documentation, even though such efforts could scientifically aid in their conservation. The citizen science model of archiving both traditional and contemporary aspects of low-resource languages has been practised through activism.

The availability of resources to native speaker communities plays a crucial role in the intergenerational transmission of any language, a factor identified by the UNESCO Ad-hoc Expert Group on Endangered Languages as the key reason for language endangerment.^[1] These resources can be financial, human, socio-political, institutional, technical, or educational in nature, and their lack or insufficient availability directly impacts language endangerment. Therefore, it is important to consider resource availability when designing language documentation strategies. Documenting indigenous and endangered languages through digital media can help sustain and grow the use of these languages. It is important to note that a lack of online platforms and other digital mediums or knowledge about such platforms and mediums can contribute to language endangerment. Digital interventions and digital pathways have proven effective in not only normalising the use of indigenous languages online but also creating a ripple effect that leads to the wider use of these languages in a specific location.^[2]

The rise of pervasive technology, particularly cheaper smartphones, has begun to address some of the previous challenges to multimedia documentation in different ways among various language documentation stakeholders. Linguists also value "documentary linguistics," which captures face-to-face interactions between native speakers.^[3] The Rising Voices project at Global Voices encourages young speakers to actively use their native languages through various forms of online, digital, and in-person engagement.^[4] Our research during the UNLOCK acceleration program showed that language digital activism includes a range of initiatives. It can be confusing for new activists to determine the best tactic or strategy to achieve their goals.^[5]

In response to the UNESCO International Decade of Indigenous Languages campaign^[6] and beyond, both professional field researchers and activists, particularly, citizen documentors and citizen archivists (CDCA), are working to create multimedia documentation of low-resource languages, especially those that are endangered. Citizen documentors and archivists, who may be self-taught or lack formal training in field linguistic documentation and may or may not be native speakers, often have limited access to and affordability of institutional, financial, and other relevant resources. OpenSpeaks was created in 2017 as a standalone project hosted at https://openspeaks.com and was subsequently integrated into Wikiversity in 2019.^[7] The Open Educational Resources within the project were intended to help such activists with both strategic and operational know-how on the documentation of indigenous, endangered, and other low-resource languages. As a project, OpenSpeaks has its roots in grassroots activism, where language documentation work is led by citizen archivists who are either native speakers of low-resource languages or in close contact with native speakers. The author's close involvement in citizen language documentation initiatives has been useful in learning about the different needs for a resource that can serve both strategic and operational purposes for citizen archivists.

I argue that the citizen model of language documentation must focus on building openly-licensed language content and Open Educational Resources, leading to native-speaker community agency over the archiving process and archived materials. I also discuss the challenges and opportunities for scaling up such initiatives and the need for ongoing support and capacity building for citizen documentors and citizen archivists. Finally, I outline the implications of this work for documentary linguistics and the broader field of language documentation and conservation.

Background

Language activists, more specifically, citizen documenters and archivists, often record and archive language materials for the purpose of preserving diverse knowledge(s), creating community media as a direct outcome of this practice. Sociolinguistically marginalized communities also use language documentation as a technological tool for sharing indigenous knowledge and for building a democratic and inclusive environment. As UNESCO emphasized in 2019, "In today’s world, digital literacy, access to broadband connectivity and quality content, including in local languages, are also prerequisites for the fulfillment of our human rights and fundamental freedoms, as well as for our participation in the development of societies."^[8] Documentary Linguistics or Language Documentation has evolved as an academic practice and linguistic discipline for documenting languages and keeping historical audiovisual records of languages and cultures. Such archiving practice helps further the research and allow creating resources for teaching both first and foreign languages, provided the language materials are made available widely. Linguists have also documented their practice with an aim to help documentary linguists and language activists alike, as seen in the paper "Keeping it real: Video data in language documentation and language archiving"^[3], the course "Archiving for the Future: Simple Steps for Archiving Language Documentation Collections"^[9] and the online guide "Language Sustainability Toolkit".^[10] Citizen Language Documentation and Archiving (CLDA) is a community-based practice and is inspired by Citizen Science, a participatory scientific research led by people who might not be trained as scientists.^[11] Citizen Language Documentors and Archivists are activists are from the speaker communities whose languages are being documented or are community-based activists who have interest and access to the resources required for language documentation and archiving. This work could happen both through collective, institutional or individual efforts. The resources used for documenting languages in such contexts, formats of the archived media and the possibilities of using them, and the hosting platforms vary widely. Linguists have further noted about the diverse language documentation efforts by activists as some activists focus on storytelling and the others on aesthetic outcomes, and not all such documentations would capture linguistic data that can be relevant for future linguistic research and development.^[3]

Identifying issues with documentation

The lack of access to the language technology has been one of the major factors behind creating a digital divide and furthering the financial exclusion of a wide number of marginalized communities.^[8] While the need for multimedia documentation of languages is paramount, anecdotal evidence we have received from many native speaker-activists suggest that availability of human, financial, educational, infrastructural, political and other essential resources are scarce in the case of most indigenous, endangered and other marginalized languages. Additionally, multimedia recordings are more expensive than creation of textual content. It is important to note that the textual content relies on multiple critical factors such as availability of an established writing system, acceptance of a Unicode standard for script encoding, ease of script rendering across operating systems both on mobile devices and computers, and availability of Unicode-compliant typefaces. Keeping the technical factors aside, widespread use of native writing systems in many indigenous communities with a a strong oral culture is a slower process. It is also affected adversely by the neighboring and dominant writing systems. For instance, the official status of India's Odia language (written in the Odia script) and the historical push for its wider use in education, governance and mass media and the lack of the same for the neighboring indigenous languages such Santali and Ho has strongly impacted the slow spread of textual content in Ho and Santali. Further more, Anderson and Gomango (2016) keenly observe an internal neocolonialism and ethnolinguistic hierarchy being built in the Juray (Jurai) Sora indigenous cluster of Odisha on the basis of existing social discrimination of the caste system.^[12]

When it comes to audiovisual recording, the recording environments vary across the impacting factors of financial/space affordability, technical knowhow and the purpose of the recording. Natural and conversational recording of a language are often done as field recording where the speakers live or work. Such recordings provide context to the speaker's life and also the recorded content. The considerations for the field recording process are also for creative/aesthetic purposes in cases as Seyfeddinipur and Rau observe.^[3] For avoiding any distracting noise, especially for linguistic studies or conventional broadcasting, controlled environments such as a scripted recording inside a studio are also used to ensure very high-quality media. Such recordings can later be used for creating future speech synthesis systems apart from helping study grammar, vocabulary, and oral traditions.^[13] Considering the high cost of studio recording and based on the actual purpose of the recording productions also use hybrid approaches.

The need for demonstrating the informal use of a language and design multimedia literacy programs through documented multimedia evidence often requires documentation of key information of the speakers through metadata. Information related to the speaker's age, gender, influence of dominant/other non-native languages over one's native language, and their socioeconomic strata make a deep impact their overall speech. While documentation of such information is essential, they are tied to the affordability factors of the archivist.

The process of documentation tend to vary from archivist to archivist. For instance, documentary linguistics as a practice focuses on collecting the rich linguistic data of a language^[3] such as recording everyday conversation in a marketplace whereas documentary filmmakers focus on the aesthetics and storytelling. Training volunteers to create informational videos, such as the virALLanguages project that helps create COVID-19 awareness videos in indigenous/minority languages, can also be counted as documentation of language.^[14]

Citizen Language Documentation and Archiving (CLDA)

OpenSpeaks frames Citizen Language Documentation and Archiving (CLDA) is an extended form of language digital activism for documenting languages in audiovisual forms led by Citizens Documenting and Archiving Languages. Littell et al. (2018) emphasize that, based on consultations with some of the Canadian indigenous language communities, while speech technologies are desired, there is a little progress in building the same.^[15] Community-led efforts including collaborations between communities and an internal/external multimedia archivist can result in an organic and gradual development of resources required for speech technology—some of the foundational resources being recording of pronunciations of all/most words in a language by people of all genders and in different dialects. The wider scholarship in the artificial intelligence (AI) and machine learning (ML) disciplines that are intersectional to speech technologies emphasize the lack of adequate and diverse data being responsible for exclusion of historically-marginalized peoples.^[16] Slow progression of audiovisual data collection by community-led processes, as opposed to occasional interventions, presumedly has the natural advantage of access to more number of individuals in a community with different gender identities, socioeconomic groups apart from the ecological and environmental factors that affect the field recording. The same AI/ML scholarships also recommend fair data collection practices such as remuneration to contributors while underlining a binding principle of "Nothing About Us Without Us". In the context of language audiovisual-recording, this principle would translate to the native community having agency/shared ownership of data which is often restricted through copyright and other institutional paywall processes. Some of the questions to help with an inclusive approach, while still considering the low/limited access to resources an archivist might have, can be:

How can I record narratives of people of different genders while still being conscious of community taboos?
As the purpose of audio recording of words (a pronunciation library) is often for training speech recognition systems in the future, do the speakers understand the nature of the use of their data and are they consenting for public use of their data for perpetuity?
What kind remuneration I can fetch to collect data in a fair and equitable way?
Which tools and technologies would be useful in my situation and would be affordable?

Areas of focus

While the the first version (ver. 1.0) of OpenSpeaks focused primarily on the practical aspects audiovisual recording process and some parts of the prior planning and post-processing processes, the version 2.0 onward detailed on consent, content rights including copyright and content licensing. In the version 3.0 considerations and practical guide on making language documentation accessible to people with disabilities, particularly with blindness and deafness, were incorporated.^[5]

In the version 2.0 of OpenSpeaks, the inter-related disciplines of consent, content rights and licensing were explained in a single chapter.^[7] In the context of multimedia language-documentation workflow, consent can be understood as the mutual agreement between a native speaker who is interviewed by an archivist and the archivist. The documentations happens in environments where the interviewees' understanding of the future use of the content is not always the same. Even though consent is a critical element to documenting languages, standardizing content-seeking process is extremely hard and can be a futile attempt. Hence, the OpenSpeaks guides only aims at helping an archivist understand their own context as the module details the scenarios of the consent-seeking process. In the light of the complexity of this process, the answer to the question "how do I take permission for an interview?" or "do I take permission from an interviewee in writing or verbally?" are never definitive. While framing consent to be paramount to ethical research in indigenous settings, Lovo et al. (2021) explain "Informed Concent" to be (a) "a mechanism for respect of dignity and autonomy of persons that should be meaningful, trusting, transparent, un-intrusive, free of coercion, free and informative to protect human rights and bioethics", and (b) "collaborative and establishing a trusting relationship".^[17] Applying the same frame in the multimedia language-documentation process, a set of closely-connected questions shared below can be helpful to identify the adequate process for asking consent, creating a participatory space as it has always been useful in our experience.

Is the person/are the persons about to be interviewed adult and are able to understand fully the purpose of the recording and the future publication of the same?
If not, how I can discuss with them, or someone they would nominate, to clarify the complete purpose of the recording and the possible ways the recording will be used?
If they are in agreement for the recording, what would be the best way to document the agreement and share the same for any future use?
What is the literacy level of the interviewee, whether a verbal or written consent would be appropriate, and what would be the best way to document the same as an evidence?

When it comes to consent, the revocation of the same is often intended to be a part of the consent-seeking process. However, some of the copyright clauses such as the Creative Commons Licenses would make a "work" (a published recording in this context) a perpetual contribution. Hence we recommend in OpenSpeaks for having the conversation with the individual interviewees and agree mutually about consent, content rights and copyright before the recording. Some of the related areas of considerations around this subject discussed in OpenSpeaks are: a) forms of evidence-documentation in different kinds of shareable media formats such as signed agreement on a paper or an agreement given verbally and recorded as a part of the documentation itself, b) identifying the right kind of agreement-sharing platform based on the access and affordability of both the interviewee and the interviewer, c) validation of any indirect consent (e.g. physical gestures instead of a written consent), and d. process of revocation (in case of non-open licensed publication of recording). The challenges an individual interviewee might be facing because of their disability, old age, fluency of the lingua franca between the interviewer (or any intermediary translator) and themselves are also some of the discussed areas that can disqualify an indirect consent legally or ethically at a later date.^[7]

Content Rights

In addition to consent, ownership of content ("content rights") is a subjective area. One might find, while, in a set of Mesoamerican conversations along with many other cultures, knowledge is perceived to be the outcome of a collective process which does not guarantee any individual member of a community the "final authority or ownership",^[18] many during our field documentation work have shared discontent towards the irrevocable and perpetual nature of the open licensing. A case study shared in OpenSpeaks through a short film Who Owns the Content? sheds light on the complex nature of content ownership over documented. The narrator Eddie Avila shares in this film the story of a Colombian young individual discovering audio cassette tapes with recordings of his father and how the traditional stories and songs that were told by the father to an European researcher who then recorded the same.^[19] The dilemma in identifying the ownership of content gets addressed to some extent in the examples and guides shared in OpenSpeaks. Often times, the question is not "who owns the recorded content?" but "is the content owned by a community (collective knowledge) or a creation of one individual or some individuals?". As this paper and other scholarship discuss widely how the history of many indigenous peoples are not only poorly documented but there is also century-long systemic erasure of indigenous knowledge, it is not simple and easy to reach at a conclusion about ownership. An archivist assuming ownership over content and wrongly attributing either themselves or some individuals under restrictive licenses can potentially block many in the native speaker community the access to their own community knowledge.

Copyright and licensing

Quite similar to all other forms of intellectual property, copyright is assumed over all/most of the recorded content when it comes to language documentation. The premise of copyright is all kinds of "published works" that have "originality". The general rules of copyright, permission-seeking and attributing are equally applicable to any audio or video material. However, the ethical and moral sides of copyright are critical to the process in the OpenSpeaks framework. So, some of the self-assessment questions are framed to help the archivist find more details of the different rights and identify what particular licensing would make sense.

The following question helps understand the moral rights that are applicable. A narrative (a story or song or other kinds) that is predominantly known to a community would mean (moral) ownership of a community while the copyright has to be identified in a slightly different way.
- What is the kind of content that is recorded -- is it a folklore or folk song or any such narrative that is popular in the entire community or is it something that the interviewee has created on their own?
The following questions help identify who would own the copyright. Considering the fact that there are many conditions involved, copyright is identified by looking at each condition objectively. For instance, without any agreement detailing copyright in a self-sponsored documentation, the copyright of the documented audio/video only would lie with the archivist. The moral right, if the content includes a folklore or folk song, will still lie with the community.
- Does the documentation work involve a contract or agreement detailing about copyright?
- Is the work sponsored/commissioned by anyone?
- Is the documentation done by one individual or multiple individuals?
- If the documentation crew are paid or have volunteered? What kind of labor distribution exist if the work is volunteer work?
When it comes to using the appropriate license, the archivist would generally decide based on a set of conditions some of which are outlined in OpenSpeaks and are reproduced below:
- Is the archivist the sole/partial copyright owner of the content?
- Is the work a commissioned work and hence is copyrighted by someone else?
- Is there an agreement that defines the scope of copyright?
- In either of the cases shared above, does the copyright allow full access of the content to the native speaker community or does it restrict them from accessing the content?
- Are any of the particular Creative Commons licenses applicable to the work? (The Creative Commons Choose (https://creativecommons.org/choose/) or the beta Chooser (https://chooser-beta.creativecommons.org/) tools are shared on OpenSpeaks to help the archivist identify a particular license. As the focus of OpenSpeaks is to help an archivist ensure of unrestricted and unpaywalled access to content for the native language speakers, the license selection process also emphasizes on identifying the best license that would work for each community as opposed to impose only free content. Additionally, the plaintext version of a consent and rights release on OpenSpeaks draws learning from frugal field-documentation processes, a content release used for content donation for Wikimedia projects,^[20] and keeps the release comprehensive for low-medium level of literacy of interviewees.

Lastly, each specific use case of a published documentation is not always known to the archivist, especially during the time of consent-seeking or more specifically, during the content release agreement. If revocation of any consent and other permissions are not possible, the recommended fair practice in the OpenSpeaks version 3.0 onward are set around full clarification to the interviewee during consent-seeking.

Learning exercises

Survey for localization in Santali language

During the creation of the version 2.0, we conducted a bilingual (English and Santali) survey^[21] focusing on language content and rights and engaged a small group of 25 individuals including many native Santali-language speakers and are involved in content creation in the language. Some of the general observations from the survey are shared below:

Linguistic distribution:
- Among the 25 participants, 17 were native Santali-language speakers, from five different countries
- One-third of the participants represented either a collective, nonprofit, academic or another civil society organization while the remaining were a part of a different stakeholder (in language documentation) while they practice digital activism independently in personal capacity.
- The participants are broadly active in promoting their language on digital mediums, which includes both audio-visual documentation and textual documentation such as blogging, content creation/sharing and commenting on social media and web platforms.
- 12 participants who are actively involved in archiving their own languages (other than Santali) in various mediums self identified as native speakers of at least one indigenous language, and at least four of them speak a language that is an oral language with no formally-recognized writing system.
Consent-seeking:
- 13 participants expressed that they are not fully aware of asking consent during the documentation.
- We also learned that the respondents ask for consent in three major ways: a) through verbal discussions, b) during the recording process, c) through fillable forms before the recording.
- We also observed that most participants seek consent during the recording of the video while five emphasized that the consent-seeking is mostly over a verbal discussion and five others confirmed of asking for consent over a form.
Copyright and licensing:
- While almost half of the participants confirmed that they know how to make audiovisual recording in their own languages, the remaining half shared that that they need help to learn about best practices or the need for a beginner's guide.
- Most participants involved in creating multimedia documentation also expressed about the need for understanding copyright and the Creative Commons licenses for publishing the documented media.

The learning from this exercise validated our own preexisting notions around consent-seeking, copyright and overall attribution process. This process also helped break the linearity in the guides on these topics that were explained in the Chapter 1 of version 2.0 of OpenSpeaks, and the focus in the translation process was on explaining these concepts instead of merely translating. The three Santali-language speakers who led the translation for Santali, R. Ashwani Banjan Murmu, Fagu Baskey and Joy sagar Murmu, used their own respective experiences as community organizers and Wikimedians, used a feedback loop to influence the English version of the Chapter. They also used a hybrid model of selecting both loanwords (transliteration of popular English terms such as "license" in the Ol Chiki alphabet that used for Santali) and newly coined terms that can be widely understood apart from using existing vocabulary.

Chapter 2: Audiovisual recording

This chapter details process of audiovisual recording the use of languages.

Module 1: Basics of audio-visual recording

An overview of what are aimed from the recording process and how to go about it.

Prerequisites

1. Be honest and ask your interviewee to be honest: Language is a very sensitive element of a society. When any known/unknown mistakes like mispronunciations get recorded and shared publicly, native speakers might take an offense. So, please check with your interviewee to ensure that you document any unintended mistakes in the description part of the video/audio while publishing. You might not always be able to delete portions of such unintended mistakes but you can always admit that there is any unintended mistake that got recorded. Similarly, if the interviewee is not a native speaker and is trying to learn a language, you should mention clearly about that. The real native speakers will welcome such honesty.

2. Imagine yourself out in the field interviewing someone speaking a language that you don’t probably understand: Think of the challenges that you might face—the loss in translation, the lack of your understanding of their cultural/linguistics nuances. Are you going to use a language that is mutually intelligible by you both or get the questions translated or just have a translator along with you to assist?

3. Plan in advance and practice well: Planning for a documentation starts with knowing your interviewee(s) well. Do some research about their language, culture, and may be a few most used phrases in their language that you can say to amaze them while interviewing them. People generally appreciate when someone alien makes an effort to speak in their language. Use a spreadsheet or even an app to have a rough and agile plan. Things might change while interviewing and you need to be prepared for the same. Also, have a plan B in case anything fails. If you’re someone who gets a cold feet while meeting a stranger, write down and practice your questions with a friend/family member or in front of a mirror.
4. Know your hardware and software: As you are going to rely on your recording equipment and software (you will learn about them in the next module), it’s important that you know well about them. But how well is well? Well, as long as you know the ins and outs of your gears and some troubleshoot in case of emergency. For instance, if you’re planning to use your phone for the audio and video recording, check what apps are best for your workflow. It’s advisable to use apps (e.g. Filmic Pro for iOS devices) that show the audio levels on screen while recording so you know for sure that the audio is indeed being recorded.
5. Keep a notebook/note-taking app to capture some important data: Physical/digital note-taking while recording always helps during post-production. Also, you need to capture some metadata (more in Module 3) for which you can use the note or use a printed template. But please keep in mind that the noise you might make while writing might get recorded so choose your pen carefully.
6. Ensure you get to record in a quiet place: The most challenging aspect of any recording in a quiet place for clean audio and and well-lit place for good quality video. Check below to know what to avoid:

Noise sources	Possible solutions
Ambient noise (Audio)	Talk to the interviewee before recording to check what could be the least noisy place where you're going to record If you can, get a lavalier microphone (also known as lav mic, lapel mic, clip mic, etc.) so that you get a nice clean sound as it is placed close to the interviewee's face
LED and other home electric lights (Video)	Most home lights, when captured in a camera, look flickering and disturbing. When you'll learn more about the solution for such issues in the next module, avoid home lighting and use lights that are recommended (more here) for filing if you can afford. Alternatively, if you're filming during the day, you can sit close to a window with the subject's face lit with the natural lighting.

Interview process

Friendliness and empathy: The best emotion is captured when your interviewee trusts you the most. Try to be empathetic and friendly, relate to them in a human level and keep a check on their comfort level. They would open up to share something that they care about only when they think they can trust you. Trust is built over time. How do you bring it in a short interview?
Ice braker questions: You can always ask some trivial ice-breaking questions in the beginning and slowly move towards asking more personal questions.
Body language: In a physical interview, your body language matters much more than a telephonic or voice/video call. Positive body posture can entirely set the mood of the subject. So a thumb rule is be a good listener and show curiosity to learn from the interviewee. But when you're interviewing someone speaking a endangered language that is alien to you, you still can start with the same body posture. Even though you won't understand the vocabulary, being empathetic and trying to relate by observing the interview's emotional flow. You could reflect that by the right kind of camera moves.
Motion is emotion: Documenting a language is not just about placing a camera on a tripod and interview someone though that's a good starting point. But you need to capture the life of someone on the camera if you're capturing them saying about their life. If a picture means a thousand words, a video means a million! So, take some ample amount of time to shoot some b-rolls. For instance, if your interviewee has narrated about a bedtime story during the interview, capture some relevant shots—like kids sitting around an old person, or parents with kids. B-rolls are generally short so shoot really tiny videos (30 seconds - 1 minute max.) and cover a wider range of areas because you never know where you can use them. You can use the b-rolls as cut shots.

Module 2. Hardware and software for recording, and recording process

Audio recording

A home studio setup consisting of a computer installed with a free and open source audio recording/editing software like Audacity, a professional microphone, and a monitoring headphone. Read more in our Pronunciation Toolkit.

Different scenarios:

Home studio: If you're recording at home, try to create a minimal setup You need a microphone to be able to record the audio. If you can, I would suggest to record in a small home studio setup like the picture above (consists of a USB microphone, a computer, and a monitor headphone).
Field recording with a recorder or phone: The recording setup will largely vary if you are meeting someone outside your home for a field recording. In that case you will need to carry an audio recorder or a smartphone (some sort of recording app installed in it) with earphones. If you’re using a portable recorder make sure you cover the top of the mic with a soft cotton cloth or fake fur to a) avoid dust going inside, and b) the sound of the wind during outdoor recording. Use a rubber band to tighten the base and never touch the cloth/fur while recording. Mics can capture small little movements and completely distort the audio.
Recording from phone: Earphones that come with the phones generally work both for phones and computers as compared to the default microphone provided along with . However, avoid sitting in an open space as there is a high probability of a lot of noise being captured unless if you are using a shotgun microphone.
Audio editing software: If editing from a computer, Audacity, a free and open source audio editing software is the first choice for many seasoned recording artists. It is robust, easy to use and can be used in multiple platforms. If you are using your phone or tablet to record and edit the audio, then, use your native recording app or try to find a good free alternative in your respective app store. Ideally the recording/editing app should be allowing you to record in a decent lossless quality (minimum requirement is 44100 Hz, above 16 bit PCM i.e. 24 or 32 bit, above 220 kbps; check your settings to find these). Save the audio in .WAV or .FLAC (Audacity supports both). If your recorder/phone does not support these formats, try to use an app/online converter like this (MP3→FLAC or M4A→FLAC) to convert the audio into .FLAC.

Video recording

Which camera to use

Frankly speaking, the video is less important here as compared to the audio. With low quality video, viewers would still be able to manage if the audio is loud and clear. So if you are keen on investing, invest on a good quality microphone that can either be connected with the camera or can be used as a secondary recorder. But do not trust your camera’s default microphone. They can literally jeopardize your hard work. As far as the camera goes, you can literally use any camera that allows you to record in a decent quality i.e. above 720p (1280×720 px)—from your phone to a point and shoot camera to a dSLR.

a) Using a camera: Use a shotgun microphone that can be connected directly into your camera so that you don’t need to invest much on audio syncing during post production.
b) Using a phone for recording video: These days most phones come with high quality hardware that are capable of recording good video. But the real key to recording quality video in a phone lies in stabilizing the shot while recording. You can only do that by investing in a small tripod (they are generally really cheap and do the job) that can hold your phone. For this particular project, tripods will be the best.

How to edit the videos: You need to compress the video using a free software like Handbrake, and upload that into YouTube or something similar without making it public. We will download it and ask you to delete so that you don’t have to worry about the amount of space it will take in your hard drive.

Chapter 3: Metadata collection and publication

Annotation, subtitling of audio/video, translation of transcription and other content Download Content Release form (editable document in .odt and .docx, fillable form in .pdf); Metadata Documentation Sheet in .ods, .xlsx)

Annotation is the process of collecting additional information that might help provide background to any particular situation. For instance, a particular alcoholic beverage in an indigenous community is offered to the local deity first before drinking. A video that shows people consuming and the subtitles/captioning with the conversation that they are having might not provide enough context. Such nuances are generally added in text or audio along with a timestamp (e.g. refer to 01:36: Lakshmi and Babu are showing a gesture of respect to each other before drinking "rasi"). Audio/video content will surely need subtitles in largely spoken languages like English for a wider coverage. Transcriptions are generally created to have a verbatim version of the interview. Ideally, you need to work post-interview with a native speaker to create the transcription to ensure there is no loss of information in the process. However, transcription is not a easily digestible. So you need to create summaries for each section of the interview which will capture the highlights and sometimes details (for instance a game play or story).

Chapter 4: Accessibility

Accessibility considerations are to ensure that everyone can access the published digital media with no/moderate hassle. The underlying principle with accessibility is ensuring that none is excluded and making conscious effort to avoid any critical issues to people with disability. Use of subtitles/captions in audio and video, using typefaces/fonts that in the visual media that have proper contrast, size and alignment considerations, and use of colors that are friendly to the eyes of people with color blindness are some of the most important consideration. To check whether the media you have published is accessible or not, you could use the below checklist.


	Yes/no, How to	Recommendations
A. Video captioning Do your audio/video have subtitles/caption?	Yes	Closed captioning (CC) is more preferred for web applications as the caption is not "burned in" (hardcoded) on the video but is displayed separately. It also helps for translation of captions if you could release it as Timed text formats such as SubRip (files ending with a .srt suffix).	Open captioning means that the captions appear as images that are "burned in" on the video. You can only watch it whereas you can select different language versions available in case of Closed captioning.
	No	Adding captions to videos is a very essential requirement when it comes to linguistic documentation. There are many ways to add captions. For computers, a highly recommended software is Aegisub (user manual) as it supports all major platforms (Windows, Mac and other Unix operating systems). Many modern video editors also support captioning. If you are collaborating with remote translators then Amara is a recommended option. It is an Open Source video subtitling platform (learn how to use it from here). Popular platforms like Internet Archive, Vimeo and YouTube are supported on Amara. YouTube also supports an in-built Closed Captioning. We strongly recommend the comprehensive guides that BBC has created (short version here, long version here) to learn how to create accessible captioning.
B. Audio/video transcriptions Do you upload a transcription file separately along with your audio documentations?	Yes	Verbatim transcriptions often retain stutters and fillers such as "umm..", "hmm.." that are a part of human speech. As the primary purpose of transcriptions is accessibility, verbatim transcriptions help.	Non-verbatim transcriptions either omit stutters and fillers entirely or they are replaced with explanatory text. You might have seen in (English-language) movie subtitles how they write [MUSIC]^{[A 1]} when there is a background music playing. Similarly, you can use different explanatory texts based on the context. (see below for how to transcribe)
	How to	Please see the Transcripts resource page on W3C for more recommendations. Here is a step-by-step guide to create audio-to-text transcription that might be useful in some cases.
	No	Written languages: You must consider adding transcriptions to your audio and video. Simply put, transcriptions the text version of what is heard in an audio or video. They are very essential for people with full/partial blindness as they use screen reader software to convert text into audio and listen to the audio version to be able to access the content. Transcriptions are also helpful when a particular word is not very clearly pronounced. It is important to note that many written languages might not have yet a speech synthesis software but language documentations have a long lifeline. So, if you transcribe today and upload the transcription, it might be useful some day. It is often uploaded separately as a text file along with an audio file. YouTube shows the transcription separately when the option is selected on the right side of the video (only when the video is captioned).	Spoken/oral languages: As oral languages do not have a writing system, you might consider translating the content first into a well known language that is relevant in your context, and make the transcription available.
C. Color contrast	How to	High contrast text is easily readable by people with low vision. So, it is always preferred over any aesthetics corrections. In your titles/captions, credits in the case of videos, documents shared along with audio/video, and web displays (websites, blogs, articles), try to use high contrast text.	Extremely light-shaded text over a light-shaded background (e.g. grey over a sky background like this) are hard to read for many.

Additional information

Self declaration

The authors self-identify as dominant-caste and cis-gender male individuals India's discriminatory Hindu caste system. Generally speaking, such individuals have historically received through that system the most amount of privileges irrespective of their political stands or understanding of social justice. The authors would further like to acknowledge that their personal socioeconomic privileges, in which the caste system plays a large role, have resulted in their early and wider access to the internet, fluency in English and access to a range of other resources. These access/privileges resulting in their early entrance into the open knowledge movement such as Wikipedia/Wikimedia projects, and even affordability for volunteer participation in such movements. The authors are aware that they have been benefitted in their own respective lives from the existing discriminations of the caste system, patriarchy, neocolonial practices of their own majoritarian community, and different forms of sociocultural systems.

Acknowledgements

OpenSpeaks has been enriched from a range of major projects, readings and interactions. It might not be possible to attribute all in a chronological order but some of the individuals and organizations include, but is not limited to:

Indigenous communities: Santali community (specifically Ramjit Tudu, R Ashwani Banjan Murmu, Fagu Baskey and Joy sagar Murmu); Bonda community of Bandhuguda, Malkangiri district, Odisha, India; Ho community of Keshpada, Mayurbhanj district, Odisha, India; Kusunda, Tharu and Magar communities of Kulmor, Dang district, Nepal; Gutob community of Tukum, Koraput district, Odisha, India
Civil society partners/donors: Eddie Avila and Rising Voices Global Voices, Creative Commons, National Geographic Society, Mozilla Open Leadership Series, MJ Bear Fellowship 2017, Online News Association, WhoseKnowledge?, UNESCO, Centre for Internet and Society (India), Adivasi Lives Matter, Digital Empowerment Foundation
Other communities and conference: Wikimedians from around the world, particularly during Wikimania 2017, 2018 and 2019, Celtic Knot Conference 2018 and 2019; Creative Commons Global Summit 2019 and 2020; Internet Governance Forum, Mozilla Festival 2021; National Geographic Citizen Science Workshop 2018; two TEDx talks
The Chapter "Chapter 1: Consent, Content Rights and Content Licensing" was created and expanded with a grant from Creative Commons. More details in this page.

Competing interests

The author has no competing interest.

Ethics statement

This project draws direct/indirect learning from documentary films Gyani Maiya (2019), Mage Porob (2019) and Remosam (2019) that were made in collaboration respectively with the Kusunda community of Nepal, and Ho community and Bonda community of India. The participating individual members of these communities were interviewed with consent abided by the consent guidelines outlined in this project and the National Geographic Society release. Traditional community ethics were abided in all places while working together with indigenous groups and a high standard of moral and ethical standard was adhered to otherwise.

Notes

↑ The vocabulary, format and style for transcriptions vary from platform to platform. For instance, some use [NAME OF SONG IN BACKGROUND] whereas others use icons such as ♬ NAME OF SONG IN BACKGROUND ♬ for representing the same thing.

References

↑ UNESCO Ad Hoc Expert Group on Endangered Languages (in English). Paris: International Expert Meeting on UNESCO Programme Safeguarding of Endangered Languages, UNESCO. 2003. pp. 7–8. https://ich.unesco.org/doc/src/00120-EN.pdf.
↑ Avila, Eddie (2017). "How indigenous digital activists are leveraging the internet to revitalize their native languages". Linguapax Review 5: 80–89. http://www.linguapax.org/wp-content/uploads/2018/11/LinguapaxReview2017_web-1.pdf.
↑ ^3.0 ^3.1 ^3.2 ^3.3 ^3.4 Seyfeddinipur, Mandana; Rau, Felix (September 2020). "Keeping it real: Video data in language documentation and language archiving". Language Documentation & Conservation 14: 503–519. ISSN 1934-5275. http://hdl.handle.net/10125/24965.
↑ Avila, Eddie (2021). "Technology in Language Revitalization: Rising Voices". In Olko, Justyna; Sallabank, Julia. Revitalizing Endangered Languages: A Practical Guide. Cambridge: Cambridge University Press. pp. 315–316. doi:10.1017/9781108641142.018. ISBN 9781108641142. https://www.cambridge.org/core/books/revitalizing-endangered-languages/technology-in-language-revitalization/9C4ED484CB915554C249941840999821.
↑ ^5.0 ^5.1 Le Guen, Laila; Panigrahi, Subhashish (2021). "OpenSpeaks Accessibility". Wikimedia Deutschland. Archived from the original on 2021-11-21. {{cite web}}: |archive-date= / |archive-url= timestamp mismatch (help)
↑ "Global action plan of the International Decade of Indigenous Languages (IDIL 2022-2032)". UNESCO. 2021. Retrieved 2021-11-22.
↑ ^7.0 ^7.1 ^7.2 Wikiversity contributors (2021-06-23). "OpenSpeaks". Wikiversity. Retrieved 2021-12-11.
↑ ^8.0 ^8.1 "International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide" (PDF). UNESCO. 2019-12-04. Retrieved 2021-11-28.
↑ Kung; Smythe, Susan; Pojman, Elena; Niwagaba, Alicia (2020). "Archiving for the Future: Simple Steps for Archiving Language Documentation Collections [OER]". Retrieved 2021-12-22.
↑ Daigneault, Anna Luisa; Udell, Daniel Bögre; Tcherneshoff, Kristen; Anderson, Gregory D. S. (2021). "The Language Sustainability Toolkit". Living Tongues Institute. Retrieved 2021-12-22.
↑ Silvertown, Jonathan (2009). "A new dawn for citizen science". Trends in Ecology & Evolution 24 (9): 467–471. doi:10.1016/j.tree.2009.03.017. ISSN 1872-8383.
↑ Anderson, Gregory D. S.; Gomango, Opino (2016). "On the current status and state of Juray in the Sora-Juray cluster". In Ostler, Nicholas; Mohanty, Panchanan. FEL XX: Language Colonization and Endangerment: Long-term effects, echoes and reactions: Proceedings of the 20th FEL Conference 9–12 December 2016. Hungerford, England: FEL. pp. 103–109. ISBN 9780956021083.
↑ Ani, Kelechi Johnmary (2012). "UNESCO prediction on the extinction of Igbo language in 2025: analyzing societal violence and new transformative strategies". Developing Country Studies 2 (8): 110–118. ISSN 2225-0565. https://www.iiste.org/Journals/index.php/DCS/article/view/2934.
↑ Panigrahi, Subhashish (2020-05-11). "Promoting coronavirus education through indigenous languages". Global Voices. Retrieved 2021-05-05.
↑ Littell, Patrick; Kazantseva, Anna; Kuhn, Roland; Pine, Aidan; Arppe, Antti; Cox, Christopher; Junker, Marie-Odile (2018). "Indigenous language technologies in Canada: Assessment, challenges, and successes". Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New Mexico: International Committee on Computational Linguistics. pp. 2620–2632. ISBN 978-1-948087-50-6. https://aclanthology.org/C18-1222.pdf.
↑ Gebru, Timnit (2020). "Race and Gender". In Dubber, Markus D.; Pasquale, Frank; Das, Sunit. The Oxford Handbook of Ethics of AI. Oxford University Press. pp. 251–269. doi:10.1093/oxfordhb/9780190067397.013.16. ISBN 9780190067397.
↑ Lovo, Etivina; Woodward, Lynn; Larkins, Sarah; Preston, Robyn; Baba, Unaisi Nabobo (2021-10-09). "Indigenous knowledge around the ethics of human research from the Oceania region: A scoping literature review". Philosophy, Ethics, and Humanities in Medicine 16 (1). doi:10.1186/s13010-021-00108-8. ISSN 1747-5341. http://dx.doi.org/10.1186/s13010-021-00108-8.
↑ Brown, Penelope; Sicoli, Mark A.; Le Guen, Olivier (2021-10). "Cross-speaker repetition and epistemic stance in Tzeltal, Yucatec, and Zapotec conversations". Journal of Pragmatics 183: 256–272. doi:10.1016/j.pragma.2021.07.005. ISSN 0378-2166. http://dx.doi.org/10.1016/j.pragma.2021.07.005.
↑ Panigrahi, Subhashish (Director) (2019). Who Owns the Content? (Film). O Foundation. Archived from the original on 2021-08-15. Retrieved 2021-12-12.
↑ Wikimedia Commons contributors. "Commons:Wikimedia VRT release generator". Wikimedia Commons. Retrieved 2021-12-23.
↑ Panigrahi, Subhashish; Tudu, Ramjit (2020-11-28). "We're updating OpenSpeaks and we'd love to hear from you!". O Foundation. Retrieved 2022-01-04.

[24] The vocabulary, format and style for transcriptions vary from platform to platform. For instance, some use [NAME OF SONG IN BACKGROUND] whereas others use icons such as ♬ NAME OF SONG IN BACKGROUND ♬ for representing the same thing.

[3] UNESCO Ad Hoc Expert Group on Endangered Languages (in English). Paris: International Expert Meeting on UNESCO Programme Safeguarding of Endangered Languages, UNESCO. 2003. pp. 7–8. https://ich.unesco.org/doc/src/00120-EN.pdf.

[4] Avila, Eddie (2017). "How indigenous digital activists are leveraging the internet to revitalize their native languages". Linguapax Review 5: 80–89. http://www.linguapax.org/wp-content/uploads/2018/11/LinguapaxReview2017_web-1.pdf.

[:1-5] 3.0 ^3.1 ^3.2 ^3.3 ^3.4 Seyfeddinipur, Mandana; Rau, Felix (September 2020). "Keeping it real: Video data in language documentation and language archiving". Language Documentation & Conservation 14: 503–519. ISSN 1934-5275. http://hdl.handle.net/10125/24965.

[6] Avila, Eddie (2021). "Technology in Language Revitalization: Rising Voices". In Olko, Justyna; Sallabank, Julia. Revitalizing Endangered Languages: A Practical Guide. Cambridge: Cambridge University Press. pp. 315–316. doi:10.1017/9781108641142.018. ISBN 9781108641142. https://www.cambridge.org/core/books/revitalizing-endangered-languages/technology-in-language-revitalization/9C4ED484CB915554C249941840999821.

[:3-7] 5.0 ^5.1 Le Guen, Laila; Panigrahi, Subhashish (2021). "OpenSpeaks Accessibility". Wikimedia Deutschland. Archived from the original on 2021-11-21. {{cite web}}: |archive-date= / |archive-url= timestamp mismatch (help)

[8] "Global action plan of the International Decade of Indigenous Languages (IDIL 2022-2032)". UNESCO. 2021. Retrieved 2021-11-22.

[:2-9] 7.0 ^7.1 ^7.2 Wikiversity contributors (2021-06-23). "OpenSpeaks". Wikiversity. Retrieved 2021-12-11.

[:0-10] 8.0 ^8.1 "International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide" (PDF). UNESCO. 2019-12-04. Retrieved 2021-11-28.

[11] Kung; Smythe, Susan; Pojman, Elena; Niwagaba, Alicia (2020). "Archiving for the Future: Simple Steps for Archiving Language Documentation Collections [OER]". Retrieved 2021-12-22.

[12] Daigneault, Anna Luisa; Udell, Daniel Bögre; Tcherneshoff, Kristen; Anderson, Gregory D. S. (2021). "The Language Sustainability Toolkit". Living Tongues Institute. Retrieved 2021-12-22.

[13] Silvertown, Jonathan (2009). "A new dawn for citizen science". Trends in Ecology & Evolution 24 (9): 467–471. doi:10.1016/j.tree.2009.03.017. ISSN 1872-8383.

[14] Anderson, Gregory D. S.; Gomango, Opino (2016). "On the current status and state of Juray in the Sora-Juray cluster". In Ostler, Nicholas; Mohanty, Panchanan. FEL XX: Language Colonization and Endangerment: Long-term effects, echoes and reactions: Proceedings of the 20th FEL Conference 9–12 December 2016. Hungerford, England: FEL. pp. 103–109. ISBN 9780956021083.

[15] Ani, Kelechi Johnmary (2012). "UNESCO prediction on the extinction of Igbo language in 2025: analyzing societal violence and new transformative strategies". Developing Country Studies 2 (8): 110–118. ISSN 2225-0565. https://www.iiste.org/Journals/index.php/DCS/article/view/2934.

[16] Panigrahi, Subhashish (2020-05-11). "Promoting coronavirus education through indigenous languages". Global Voices. Retrieved 2021-05-05.

[17] Littell, Patrick; Kazantseva, Anna; Kuhn, Roland; Pine, Aidan; Arppe, Antti; Cox, Christopher; Junker, Marie-Odile (2018). "Indigenous language technologies in Canada: Assessment, challenges, and successes". Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New Mexico: International Committee on Computational Linguistics. pp. 2620–2632. ISBN 978-1-948087-50-6. https://aclanthology.org/C18-1222.pdf.

[18] Gebru, Timnit (2020). "Race and Gender". In Dubber, Markus D.; Pasquale, Frank; Das, Sunit. The Oxford Handbook of Ethics of AI. Oxford University Press. pp. 251–269. doi:10.1093/oxfordhb/9780190067397.013.16. ISBN 9780190067397.

[19] Lovo, Etivina; Woodward, Lynn; Larkins, Sarah; Preston, Robyn; Baba, Unaisi Nabobo (2021-10-09). "Indigenous knowledge around the ethics of human research from the Oceania region: A scoping literature review". Philosophy, Ethics, and Humanities in Medicine 16 (1). doi:10.1186/s13010-021-00108-8. ISSN 1747-5341. http://dx.doi.org/10.1186/s13010-021-00108-8.

[20] Brown, Penelope; Sicoli, Mark A.; Le Guen, Olivier (2021-10). "Cross-speaker repetition and epistemic stance in Tzeltal, Yucatec, and Zapotec conversations". Journal of Pragmatics 183: 256–272. doi:10.1016/j.pragma.2021.07.005. ISSN 0378-2166. http://dx.doi.org/10.1016/j.pragma.2021.07.005.

[21] Panigrahi, Subhashish (Director) (2019). Who Owns the Content? (Film). O Foundation. Archived from the original on 2021-08-15. Retrieved 2021-12-12.

[22] Wikimedia Commons contributors. "Commons:Wikimedia VRT release generator". Wikimedia Commons. Retrieved 2021-12-23.

[23] Panigrahi, Subhashish; Tudu, Ramjit (2020-11-28). "We're updating OpenSpeaks and we'd love to hear from you!". O Foundation. Retrieved 2022-01-04.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[A 1]