Home
Manage Projects
Students
About us
Guide
Available Projects
Finished Projects
Info
Newspaper
Contact
sign in
sign up
Home
Manage Projects
Students
About Us
Guide
Available Projects
Finished Projects
Info
Newspaper
Contact
sign in
sign up
Software for Automated Creation of Training Images Using GenAI Techniques
AI and Machine Learning
Project Guide :
Sasha Apartsin
Development :
Start :
2025-10-29
Finish :
2026-03-08
Hebrew Year :
תשפו
Semesters :
1st & 2nd
Description
Alexander (Sasha) Apartsin, http://apartsin.faculty.ac.il/ alexanderap@hit.ac.il INTRODUCTION Many practical computer vision problems, such as image classification, object detection, and segmentation, can now be successfully addressed using modern deep neural networks (DNNs). However, these models require large volumes of high-quality labeled image data, which is often difficult or impossible to obtain. For example, in healthcare, training models to detect rare conditions from medical images is hindered by the limited availability of annotated examples and strict data privacy regulations. In cybersecurity, identifying new forms of visual spoofing or tampering (e.g., deepfakes, adversarial images) poses a challenge due to the lack of labeled attack samples. Fortunately, recent breakthroughs in generative AI, especially diffusion models, have made it possible to produce realistic, high-quality synthetic image data at scale and low cost. BACKGROUND This project aims to develop a suite of software packages for the automatic generation of synthetic image datasets using cutting-edge diffusion models and modern AI libraries. These tools will enable the creation of diverse, high-fidelity image datasets tailored to specific application domains, such as education, software engineering, cybersecurity, and healthcare. By making high-quality synthetic data accessible, the project addresses the growing need for reliable training and evaluation resources in scenarios where real-world image data is limited, sensitive, or expensive to obtain. PROJECT SCOPE The project team will carry out a series of synthetic image data generation tasks, focusing on achieving both visual realism and semantic coverage across key domains. After designing and validating generation methods, the team will use state-of-the-art pretrained computer vision models to establish meaningful performance baselines for each dataset. Each task in the project will follow a structured workflow: 1. Designing a domain-specific image generation strategy. 2. Implementing the corresponding software package using diffusion models and modern libraries. 3. Running data generation experiments to produce and validate synthetic datasets. 4. Evaluating task-specific performance using pretrained vision models (e.g., CLIP, ViT, SAM). 5. Publishing the software, datasets, and baseline results for public use and further research. STUDENT REQUIREMENTS 1. Proficiency in Python programming 2. Commitment to at least six weekly hours on average DEVELOPMENT TOOLS 1. Programming Language: Python 3.x 2. GenAI programming libraries: HuggingFace diffusers, PyTorch 3. Development Environment: JupyterLab, VSCode, PyCharm DELIVERABLES Final Git Repository Contents (Image and Computer Vision Focus) The final Git repository will contain all necessary components to support the generation, evaluation, and reuse of synthetic image datasets for computer vision tasks: 1. Source Code o Modular software packages built around diffusion models for synthetic image generation o Scripts for image generation workflows, dataset preprocessing, and baseline model evaluation 2. Generated Datasets o Synthetic image datasets for training and testing across domains such as healthcare, cybersecurity, and education o Accompanying metadata, annotations (e.g., bounding boxes, segmentation masks), and format specifications 3. Baseline Results o Evaluation scripts and configuration files for standard vision tasks (e.g., classification, detection, segmentation) o Performance benchmarks using off-the-shelf computer vision models (e.g., ViT, CLIP, SAM) o Visualizations and summary plots comparing model performance across synthetic and real datasets 4. Documentation o A comprehensive user manual detailing installation, setup, and usage o A developer guide for customizing or extending the generation pipeline o Descriptions of domain-specific data generation strategies, model configurations, and design rationale
Emphasis in project execution
The project is has cooperation with the industry and combines meeting deadlines while being creative and focused on the task
Status:
Shown in Available Projects
Create New Student Profile + Register to this Project
I have a question