AR Conference

This learning resource is about the requirements and constraints of Augmented Reality (AR) Conferences.

Kanji Marker^[1] for AR.js - places the animated 3D model of an AR Conference participant on the marker in camera image. Users can place the participants according to the markers in the camera image.

Objective

The learning resouce elaborates on basic concepts of AR conferences, requirements and constraints within the use of web-based technologies. The AR conference should be usable in low-bandwidth environments is available.

History of Learning Resource

The concept of the learning resource was driven by use of standard video conferencing systems in times of COVID-19 and missing physical presence in lecture room, where you can look around and watching the different participants speak or other students in the seminar room or classroom. Learning environment using video conference are screen focused, while classroom or seminar room is an open space where learners interact with material in room look at eachother while interacting with the learning environment. This comparison of the real classroom situation and learning environment with a video conference leads to this learning environment about Augmented Reality Conferences

Requirements and Constraints

the learning resource follows the Open Community Approach and share the content of the learning resource on Wikiversity and
uses OpenSource to implement prototypes and small test scenarios to explore the basic constitutent of AR conferences

Learning Tasks

(3D Modelling) Explore the learning resource about 3D Modelling and learn about basic concepts of generating 3D models of the lecture room or classroom
(3D Design of Classrooms) Explore the OpenSource software Sweet Home 3D for generating the classroom in 3D.
(Augmented Reality with Markers) Assume that all participants are in larger room, which is sufficient for 5 participants. Explore marker based Augmented Reality and assume that you place a marker for all of the 5 people and everyone places the marker at a chair for the camera position or the position of the single participant in the room. Assume that the face animation of the remote participant is displayed on the marker in room and the other participants wear a head mounted display for the web-based application (e.g. with AR.js) so the other people are projected into the real camera image.
(Audio-Video-Compression) Analyze the learning resource about Audio-Video-Compression and identify possibilty to reduce the bandwidth, e.g. using a GIF animation for silence and talking projected on a plane in the Aframe model.
- Use silence detection in the audio stream to select which GIF animation is displayed in the Aframe plane of the AR.js environment.
- 3D modelling needs a lot of client performance of rendering photo realistic faces. Therefore we use in the learning resource GIF animation of short videostream.
- Assume you have multiple GIF animations for different emotions. Is is possible to detect emotions in an audio stream and display the appropriate GIF animation in AFrame model for the AR conference.
- Assume you use phonem recognition and transmit just the phonem sequence in low-bandwidth enviroments, how do you use the animation features in Aframe model the mouth appropriate for the speaking the words e.g. the "o" in "onion" or the "sh" in "shark".
(Motion Capture) Motion capturing is a standard method of transfering movements of an actor to an digital model (e.g. a robot, an fictional character in a movie, a dinosaur, ...). How can motion capturing be used in Open Source software. Assume you transmit the position of markers on the face instead of the real camera image. What is the compression rate roughly for an HD video stream?
(Read Words from Face Expression) Some people that have difficulties to hear someone speaking have the skill to read the spoken word from the face. How can this expertise be used for improving speech recognition and sending the recognized word instead of the audio stream. How this feature be used for supporting handicapped people in AR video conference?
(AR Conference on Mars) We use the remote location on planet Mars to explain the concept of Augmented Reality video conference in the context of a learning environments. Assume we use a Mars rover with a stereoscopic 360-Degree image (for visual information for left and right eye).
- 5 learners and a teacher can meet in physical room (Greenbox) with Motion Capture option of the 6 people in the green room. All people in the room wear a head mounted display and the green screen method will replace the background with a real 360 degree image from the mars (keep in mind the 8min latency from Mars to Earth).
- Now we replace the real stereoscopic camera image from the Mars rover with 3D model of the Mars surface (see Digital Elevation Model (DEM)). Now teachers and learners can jump to different location on the Mars explore surfaces and the Modelling allows to view the situation on Mars when water was available on Mars.

Transfer the remote AR conference on Mars to an island that is affected from climate change and learner can explore the situation on that island in an AR conference. Compare AR conference with a real visit on that island and talking to people, that live on that island and where exposed to rising sea level. Discuss also the carbon footprint of learning environments.

(Mathematical Modelling) This learning task was created for learners with a mathematical background. Assume you have a body position encoded with 25 points and 25 frames per seconds. Every point

P_{(k,t)}:=(x_{(k,t)},y_{(k,t)},z_{(k,t)})\in \mathbb {R} ^{3}

with

k\in \{1,...,25\}

and

t\in \{1,...,100\}

consists of a

x,\,y

and

z

coordinate. One single coordinate is represented by a real value (stored as float variable in a programming language). The movement is recorded for 4 sec (i.e. 100 frames). Calculate the number of real values that you need to encode the body movement 4 sec. Compare the required storage to the storage of the whole body surface encoded as 3D points. What are the benefits and drawback of such an encoding? How would you apply that on facial motion capture of an AR Conference?

References

↑ Kanji Marker provided by ARToolkit on Github (accessed 2017/12/12) - https://github.com/artoolkit/artoolkit5/blob/master/doc/patterns/Kanji%20pattern.pdf
↑ Olsen, NL; Markussen, B; Raket, LL (2018), "Simultaneous inference for misaligned multivariate functional data", Journal of the Royal Statistical Society Series C, 67 (5): 1147–76, arXiv:1606.03295, doi:10.1111/rssc.12276

[1] Kanji Marker provided by ARToolkit on Github (accessed 2017/12/12) - https://github.com/artoolkit/artoolkit5/blob/master/doc/patterns/Kanji%20pattern.pdf

[2] Olsen, NL; Markussen, B; Raket, LL (2018), "Simultaneous inference for misaligned multivariate functional data", Journal of the Royal Statistical Society Series C, 67 (5): 1147–76, arXiv:1606.03295, doi:10.1111/rssc.12276

[1]

[2]