MEDIA OUTREACH

Agora Supports 360° Spatial Audio and Exclusive Avatars with MetaChat Solution

784
×

Agora Supports 360° Spatial Audio and Exclusive Avatars with MetaChat Solution

Sebarkan artikel ini

CHINA – 10 January 2022 – Register, log in, choose a name, generate your own avatar, select any chatroom to enter, and you will be able to interact with numerous other users in a low-latency, stable and smooth manner within the “metaverse”. There are various virtual scenes for you to choose from; and 360° spatial audio allows “social butterfly” and”social phobia” to experience the immersive of “face-to-face” conversation even though they are thousands of miles away. The solution behind this is Agora’s MetaChat solution.

Agora’s MetaChat solution provides a new form of voice chat–users participate in the chat with their avatars. Currently, the solution already supports user-defined face pinching, clothing and hairstyle pairing and other features, which allowing users to create a unique avatar for themselves. In addition, a user’s avatar style can be customized, to meet the needs of developers for various business scenarios including virtual reality and cartoon.

Agora’s MetaChat solution currently provides three virtual scenarios: Party, Cafe, and Bar. In the future, the solution will expand diversified scenarios such as exhibition, study rooms and disco hall, etc. Developers can choose the suitable settings according to their business features, and can also customize senarios on-demand. Currently, each chatroom can support 16 players. Each player can see the avatars of other players, and interact with any of them. They can also choose any empty seat to sit down, drink at the bar, or dance.

Agora’s MetaChat solution supports 360° spatial audio. It can create an immersive, lifelike experience for players in a chatroom, to make interactions more efficient and interesting. Agora’s 360° spatial audio adopts a software-only algorithm which can simulate the stereo sound field in the spherical area of a person’s head. Therefore, it is not limited by hardware, and users can wear normal headphones to experience it on their phone or PC. When users move their avatars in a virtual scenario, the audio effects heard by users will change with the orientations of the avatar’s head and face as well as the angle and distance of the source of sound, to perfectly simulate the lifelike audios. Specifically, in a voice chatroom:

After entering a chatroom, players will hear the background music (BGM) played in the chatroom, such as the light music in a cafe or the live band or DJ in a bar. When the player moves to different spots in the chatroom, they will hear a different BGM effect.

For example, the BGM will turn down as the player moves away. A player can also move to other players and begin to chat with them by controlling his or her position back and forth or from left to right. In addition, the BGM in the chatroom can coexist with the audios of players. This means that a player can hear the BGM in the chatroom as well as the voices of other players within a certain range.

In the future, Agora’s MetaChat will also support two other features: the voice-driven mouth shape and the voiceprint-based voice change.

1. For the voice-driven mouth shape, Agora will provide the following two solutions:

  • Audio-only: There is no need to capture facial expressions. The algorithm is able to intelligently associate the voices of Chinese or English (or other languages) with the mouth shapes and facial expressions of the avatars, to drive the avatars to speak like real people. The solution supports 2D portraits and 3D portrait models.
  • Facial expression capture: A facial expression capture device (i.e. the front-facing camera of a phone) will need to be activated to accurately identify movements such as blinking, frowning, opening mouth, or turning head by capturing facial expression coefficients.

2. Agora’s voiceprint-based voice change feature supports the conversion of a person’s voice into another person’s voice or another style while keeping the semantics unchanged. There are two types: voice change and voice improvement:

  • Voice change: This includes voice change, style change, and emotion change. It is often used for entertainment.
  • Voice improvement: The solution can turn expressions that are not fluent into fluent ones and turn a tired voice into a resounding one. It is often used in business settings such as delivering a speech.