Synthetic Face Embeddings: Research Notes and Methodology

Face embeddings have become a cornerstone for advanced AI applications—from identity verification to personalized avatars. In our work, we generate compact numerical representations of faces that capture essential identity features, enabling robust recognition, clustering, and synthesis. This post documents our research and experiments in developing a fully synthetic face embedding model: EigenFace and the corresponding dataset, EigenFace-256, with fully permissible licenses.
Face Embedding Fundamentals
Face embeddings function as mathematical representations in high-dimensional space, capturing identity-specific features while abstracting away environmental variables. Unlike traditional computer vision approaches, embedding-based systems map facial structures to vector spaces where similarity metrics correspond to identity correspondence. The recent integration of embedding techniques with diffusion models—such as IP-Adapter and InstantID frameworks—has enhanced single-shot identity preservation across generated images, suggesting potential applications beyond classification tasks.
Introducing EigenFace: A Fully Synthetic Face Embedding Model
Traditional face recognition models often rely on real-world datasets, which present several challenges:
- Restricted Licensing: Many high-performing models (e.g., ArcFace/InsightFace) are available for research only.
- Privacy Concerns: Datasets built from real people's images raise ethical and legal issues.
- Demographic Bias: Many existing datasets lack sufficient diversity, leading to biased outcomes.
To address these challenges, we developed EigenFace, a model trained exclusively on AI-generated faces. The model architecture follows ResNet-100 configurations similar to ArcFace but is trained solely on synthetic facial data. Performance evaluation on the LFW benchmark demonstrated 91% accuracy, indicating viability for practical applications despite the synthetic training approach.
EigenFace-256 Dataset Construction
Training a robust face embedding model requires a diverse dataset with multiple images per identity—capturing variations in angles, lighting, expressions, and age. Using real-world data for this purpose poses several challenges:
- Privacy and Ethics: Real-world images can lead to legal and ethical complications.
- Bias and Imbalance: Datasets based on real images may lack diversity.
- Data Labeling Complexity: Annotating large datasets is time-consuming and costly.
Synthetic Approach
To overcome these issues, we built EigenFace-256, a fully synthetic dataset designed to maintain consistent identity across controlled variations. Key features include:
- Controlled Variations: Multiple images per identity with different angles, lighting, and expressions.
- Age Diversity: Variations that allow models to generalize across different life stages.
- Ethically Sound: 100% synthetic images, eliminating privacy concerns.
- Balanced Data: Diverse training samples to reduce demographic bias.
Unlike many synthetic datasets that generate single-image identities, EigenFace-256 provides multiple controlled variations per identity, contributing to more robust embeddings.
Technical Methods
To ensure identity consistency across various conditions, we experimented with and combined several state-of-the-art techniques:
Latent Space Mapping for Identity Disentanglement
📌 Key Idea: Learn a mapping function that separates identity from other facial attributes using a pre-trained StyleGAN.
- Uses StyleGAN’s pre-trained latent space without requiring labeled datasets.
- Learns a mapping function that encodes identity separately from attributes like pose, expression, and illumination.
- Applications: Face de-identification, synthetic avatars, privacy-preserving face synthesis.
Pros & Cons:
✅ Minimal supervision (doesn’t require labeled data).
✅ High-quality image generation using pre-trained StyleGAN.
❌ Limited flexibility in fine-grained control over identity transformations.
DCFace: Dual-Condition Diffusion Model for Face Generation
📌 Key Idea: Uses diffusion models to independently control identity and other face attributes.
- Stage 1: Generate a high-fidelity face that defines the identity.
- Stage 2: Apply different attributes (pose, lighting) while preserving identity.
- Ensures identity and style are not mixed, improving robustness for face recognition applications.
Pros & Cons:
✅ Better identity consistency than GANs.
✅ High control over lighting, expression, and pose.
❌ Computationally expensive (diffusion models require multiple inference steps).
❌ Less suited for real-time applications compared to GAN-based methods.
SynFace: Synthetic Face Generation for Face Recognition Training
📌 Key Idea: Enhances synthetic face diversity using identity and domain mixup techniques.
- Identity Mixup (IM): Creates more diverse faces by blending identity features across different individuals.
- Domain Mixup (DM): Combines synthetic and real face images to bridge the domain gap.
- Generates synthetic faces with intra-class variations (same identity, different poses, expressions).
Pros & Cons:
✅ Stronger training datasets for AI face recognition without privacy concerns.
✅ Better generalization to real-world images.
❌ Does not provide fine-grained control over identity attributes like GANs.
DiscoFaceGAN: 3D Imitative-Contrastive Learning for Disentanglement
📌 Key Idea: Embeds 3D priors into adversarial learning to achieve fully disentangled control over face attributes.
- 3D Morphable Models (3DMMs) structure the latent space, ensuring identity remains stable across variations.
- Generates realistic faces that do not correspond to any real person.
- Allows precise control over expression, pose, and lighting.
Pros & Cons:
✅ Highly controllable and interpretable face generation.
✅ Better identity disentanglement than StyleGAN-based approaches.
✅ Can embed real images into the disentangled latent space for editing.
❌ Slightly lower image quality compared to pure StyleGAN-based methods.
❌ Training requires 3D prior knowledge, making it more complex.
Arc2Face: ID-Consistent Human Face Generation
📌 Key Idea: Generates high-quality identity-consistent facial images based solely on ID embeddings from ArcFace.
Uses ArcFace embeddings to condition Stable Diffusion, eliminating the need for text prompts. Generates multiple images per identity while preserving ID consistency across variations. Fine-tuned on a high-resolution dataset derived from WebFace42M, FFHQ, and CelebA-HQ.
Pros & Cons:
✅ Superior identity consistency compared to text-based models.
✅ Does not require text-based guidance for subject retention.
✅ Highly scalable and can be integrated with existing facial recognition pipelines.
❌ Computationally expensive due to diffusion model requirements.
❌ Limited real-time application feasibility.
Flux-Schnell and Flux-Dev Models
📌 Key Idea: Directly prompting the models to generate structured age-progression image grids.
Used prompts like:"A high-quality DSLR photo grid displaying the same Asian man at various ages, ranging from child to teenager to adult to elder. The grid is arranged in a 3x3 format, with nine images in total. Each image shows the man from either the front or the side, with three images for each perspective, representing different stages of his life from childhood through to around 70 years old."
The generated grids were extracted and resized to be included in the dataset.
Pros & Cons:
✅ Efficient generation of diverse age-progressed identities.
✅ Consistent identity preservation across multiple ages.
❌ Requires manual extraction and resizing.
❌ Limited control over variations beyond age and perspective.
Resource Challenges
Developing EigenFace and EigenFace-256 required addressing significant infrastructure challenges:
- Large Dataset Management: Multi-terabyte datasets introduced storage costs and inefficiencies when provisioning GPU instances.
- Cloud VM Constraints: Dynamic provisioning of GPU VMs often necessitated repeated data transfers, impacting workflow efficiency and incurring high costs.
Our research consumed several thousand dollars in compute resources, primarily in GPU rentals and API usage fees.
Conclusion
EigenFace and the EigenFace-256 dataset demonstrate that fully synthetic data can achieve performance levels comparable to real-world datasets while mitigating privacy and licensing concerns. Our work provides a reproducible framework and valuable insights for researchers working on face recognition and synthesis. We are committed to open-sourcing our code and dataset to foster further innovation and collaboration in the community.
Access the Model and Dataset
For those interested in exploring or building upon our work, the EigenFace model and EigenFace-256 dataset are available here:
- EigenFace Model: Hugging Face - EigenFace
- EigenFace-256 Dataset: Hugging Face - EigenFace-256
EigenFace and the EigenFace-256 dataset demonstrate that fully synthetic data can achieve performance levels comparable to real-world datasets while mitigating privacy and licensing concerns.