Understanding AR, VR, and MR Technologies

Posted by Anonymous and classified in Design and Engineering

Written on June 12, 2026 in English with a size of 344.75 KB

Augmented Reality (AR) Fundamentals

AR is a technology that integrates digital information with the user’s real-world environment in real time, where virtual objects are spatially and temporally registered with physical objects and are interactive. Unlike Virtual Reality, which creates a fully artificial environment, AR enhances the real world by overlaying computer-generated content on it.

Characteristics of AR

AR is defined by the following three main characteristics:

Combines Real & Virtual World: AR blends digital elements such as images, text, and 3D models with the real physical environment instead of replacing it.
Interactive in Real Time: The AR system responds instantly to the user’s movement and actions, allowing users to interact with both real and virtual objects.
Registered in 3D: Virtual objects are placed accurately in the real environment. They stay fixed in position and change correctly when the user moves, ensuring proper alignment with the real world.

Applications of AR

Retail: Customers can try products virtually before purchasing, such as placing furniture in their homes using AR apps like IKEA Place.
Entertainment & Gaming: AR is used in games and social media filters, such as Pokémon Go and Snapchat effects.
Architecture & Construction: AR helps architects visualize buildings and designs before construction.
Navigation: AR shows directions on live camera views of roads to assist users.

AR System Architecture

An AR system architecture describes how different components work together to combine the real world with virtual (digital) content in real time.

Main Components of an AR System

User: The person who interacts with the AR system. The system is designed to help the user by providing enhanced visual or informational support (e.g., a doctor using AR glasses during surgery).
Device: The hardware used to run AR, such as a smartphone, tablet, AR glasses, or head-mounted display (HMD). The device contains a camera, sensors, display, and processor.
Real Content: Real-world information captured by the device, such as physical objects, location, environment, and live camera feeds.
Tracking: Identifies the position and orientation of the user and real-world objects. It ensures virtual objects are placed in the correct location using cameras, GPS, sensors, and image recognition.
Virtual Content: Digital information added to the real world, such as 3D models, text, images, videos, and audio.

Mixed Reality (MR)

MR is a technology that blends the real and virtual worlds so that physical and digital objects can exist together and interact with each other in real time. In MR, virtual objects are not just placed on top of the real world; they are aware of the real environment and can sit on real tables, hide behind real walls, or be blocked by real objects.

How MR Works

MR uses cameras, depth sensors, environment scanning, and spatial mapping to understand the shape and position of real-world objects, allowing it to correctly place and control virtual objects within that space. This makes MR more intelligent and immersive than AR.

Comparison: AR, VR, and MR

Feature	Augmented Reality (AR)	Virtual Reality (VR)	Mixed Reality (MR)
Real world visible	Yes	No	Yes
Virtual world visible	Yes	Yes (only virtual)	Yes
Interaction with real objects	No	No	Yes
Interaction between real & virtual	No	No	Yes
User environment	Real + digital overlay	Fully virtual	Real + interactive digital
Example	Snapchat filters	VR gaming headset	Microsoft HoloLens

AR/MR Algorithm Steps

Capture Real World: The device uses cameras and sensors (GPS, gyroscope, accelerometer) to capture the live environment.
Detect & Track Environment: The system identifies user position, device orientation, and surfaces. In MR, this includes spatial mapping.
Recognize Targets: The system looks for images, markers, or objects to determine where virtual content should be placed.
Generate Virtual Content: Loads 3D models, text, or animations based on the target.
Registration: Aligns virtual objects with the real world to ensure they appear fixed in position.
Rendering: Combines the real camera view with virtual objects.
Interaction: The user interacts via touch, gesture, voice, or movement.
Display: The final scene is shown on the device screen or headset.

Input Modalities in AR

Touch Input: Tapping, swiping, and dragging on mobile screens.
Gesture Input: Hand and body movements tracked by cameras and depth sensors.
Voice Input: Spoken commands for hands-free interaction.
Sensor-based Input: Using accelerometers, gyroscopes, and GPS to detect motion and orientation.
Camera Input: Capturing real-world images to identify markers or objects.
Eye-tracking Input: Selecting objects by looking at them.
Tangible Input: Using physical objects as controllers.

Output Modalities

Visual Output: Overlaying virtual objects, text, and animations on the real world.
Audio Output: Providing instructions, alerts, or spatial audio.
Haptic (Touch) Output: Providing physical sensations like vibration or force feedback.
Tangible Output: Using physical objects to provide touch feedback for virtual content.

Multimodal Displays

Multimodal displays combine multiple sensory channels (vision, hearing, touch) to provide a more immersive experience and prevent information overload. By distributing data across different senses, the system makes interaction more natural and effective.

Visual Perception in AR

AR systems must match human visual perception to ensure virtual objects look realistic. This includes correct depth perception, proper size, brightness, contrast, and accurate alignment. If these factors are not matched, virtual objects may appear to float or misalign, causing eye strain or dizziness.

Tracking Techniques

Marker-based Tracking: Uses pre-defined visual patterns (like QR codes) for high accuracy in small-scale environments.
Marker-less Tracking: Uses natural features (edges, textures) and SLAM (Simultaneous Localization and Mapping) to track the environment without pre-placed markers.
Sensor-based Tracking: Uses hardware sensors (accelerometers, gyroscopes) to measure device orientation and movement.
Vision-based Pose Tracking: Tracks rigid objects using cameras to detect their 3D pose.
Body & Skeleton Tracking: Tracks human body movements and gestures using depth sensors.
Hybrid Tracking: Combines multiple methods to improve accuracy and robustness.

Registration and Calibration

Registration: The process of accurately aligning virtual objects with real-world objects.
Calibration: Measuring and adjusting system parameters (camera, display, sensors) to ensure tracking and geometric relationships are accurate.

Homogeneous Coordinate System

The homogeneous coordinate system extends Cartesian coordinates (x, y, z) by adding an extra coordinate (w), allowing all geometric transformations (translation, rotation, scaling) to be expressed as matrix multiplications. This is essential for computer graphics, animation, and perspective projection.

2D Transformations

Translation: Moving an object from one position to another.
Rotation: Turning an object about a fixed point (pivot point) by an angle.
Scaling: Changing the size of an object by enlarging or shrinking it.

Geometric Modeling

Geometric modeling is the process of creating a mathematical representation of an object's shape and structure. It involves creating basic primitives (points, lines, polygons), applying transformations, and combining them into a complete model for rendering or simulation.

Window to Viewport Transformation

This process maps a selected area of the world-coordinate scene (the window) onto a specified area of the display device (the viewport). It ensures that objects are properly scaled and positioned on the screen without distortion.

Virtual Reality (VR) Concepts

VR is a 3D technology that simulates sensory experiences to give the user a feeling of presence. It is based on the 3 I’s:

Immersion: The feeling of being completely involved in the virtual environment.
Interaction: The ability to control and manipulate virtual objects.
Imagination: The potential to design environments beyond real-world limitations.

VR System Components

Hardware: VR engine (computer), input devices (controllers, gloves), output devices (HMDs, audio), and tracking systems.
Software: Application software, databases for 3D assets, and development tools (e.g., Unity, Unreal, Maya).

Display Technology: LCD vs. OLED

Point	LCD	OLED
Light Source	Uses backlight	Self-emissive pixels
Thickness	Thicker display	Thinner and lighter
Contrast	Lower contrast	Very high contrast
Viewing Angle	Limited viewing angle	Wide viewing angle
Power Consumption	Higher due to backlight	Lower, power-efficient
Response Time	Slower response	Faster response

Eye Movements and VR

Human eye movements (saccades, smooth pursuit, vergence) are essential for stable vision. Mismatches between these natural movements and VR display behavior (such as the Vergence-Accommodation Conflict) can lead to eye strain, headaches, and motion sickness.

Frame Rate and Latency

VR requires at least 90 FPS to maintain a smooth experience. Low frame rates and high motion-to-photon latency cause flickering, motion blur, and nausea. High-quality VR displays must prioritize high refresh rates and low latency to ensure immersion.

Depth Perception in VR

Depth perception is achieved through monocular cues (size, height, motion parallax) and binocular cues (stereopsis). In VR, these cues must be accurately rendered to provide a realistic sense of distance and spatial relationships.

Resolution in VR

High resolution is critical in VR because displays are placed very close to the eyes. Low resolution leads to the "screen door effect," where individual pixels become visible, reducing immersion and causing eye strain. High-resolution displays require powerful GPUs and high memory bandwidth.

Orientation Tracking

Orientation tracking measures rotational motion (Yaw, Pitch, Roll) using sensors like gyroscopes, accelerometers, and magnetometers. Sensor fusion algorithms (e.g., Kalman filters) combine this data to provide stable, drift-free tracking essential for preventing motion sickness in VR.

Related entries:

Tags: