12th International Conference on Signal and Image Processing (SIGL 2025)

Accepted Papers

A Smart Singing Platform for Psychological Assessment and Stress Relief Using Artificial Intelligence and Virtual Reality

Shengtian Hong¹ , Andrew Park² , ¹Milton Academy, 170 Centre Street, Milton, MA 02186 , ²Computer Science Department, California State Polytechnic University, Pomona, CA 91768.

ABSTRACT

SingSense addresses the gap in traditional VR singing applications by integrating real-time biometric feedback to create an adaptive and immersive musical experience. Utilizing heart rate variability data, the system dynamically adjusts environmental elements such as lighting and tempo, aligning the virtual performance space with the users physiological state. Key challenges include accurate real-time HRV data capture, low-latency data transmission, and creating a responsive VR environment. Experiments demonstrated the systems effectiveness in enhancing user engagement and immersion through personalized interactions. By harmonizing technology with human emotion, SingSense offers a novel platform that redefines interactive music experiences.

Keywords

Virtual Reality, Singing, Health, Artificial Intelligence

A Smart Performance Optimizing and Social Ranking Mobile Platform for Rowing using Computer Vision and Artificial Intelligence

Owen Arnold Brue¹, Samuel Silverberg² , ¹Ransom Everglades School, 3575 Main Highway, Miami, FL 33133 , ²Computer Science Department, California State Polytechnic University, Pomona, CA 91768.

ABSTRACT

Proper rowing training should utilize professional trainers, but not everyone who wants to get into rowing has access to that. This application will solve this problem by using pose analysis applied to videos of users’ rowing forms [1]. After the algorithm analyzes their form, A.I. will use that data to compose a personalized message that can give advice on how to improve their form. A major design challenge is creating the user feedback on how to improve form using pose analysis. Instead of showing the user how much their pose differs from the ideal form based on coordinate data, we can enlist A.I. language models [2]. This will help translate the differences from numerical data into more natural sounding advice. For rowers, they should use my application as a way to improve their form. Novices will benefit if they’re unable to train with a professional, so they’ll be able to learn proper form early and avoid injury [8]. Experts benefit too if they desire to maintain good form and continue to stave off possible injuries.

Keywords

Rowing Form Analysis, Pose Estimation, AI-Powered Coaching, Personalized Feedback

ShotTrainer: An AI-Powered Basketball Training Tool for Enhanced Shooting Accuracy Using YOLO-Based Video Analysis

Baopu Tai¹, Mirna Shabo² , ¹Maranatha High School, 169 S. Saint John Avenue, Pasadena, CA 91105 , ²Computer Science Department, California State Polytechnic University, Pomona, CA 91768.

ABSTRACT

ShotTrainer is an AI-powered basketball training application designed to improve shooting accuracy by utilizing YOLO AI-based video analysis [1]. Traditional basketball training methods rely on manual observation, which can be inaccurate and time-consuming [2]. ShotTrainer automates shot detection and performance tracking, providing real-time feedback to players and coaches. Unlike previous research focusing on kinematic analysis, AI method reviews, or strategy-based coaching tools, ShotTrainer is a fully implemented, player-centric solution that allows users to analyze their shot attempts, makes, and misses effortlessly. The system was tested in an experiment analyzing 20 basketball shot videos under different conditions to measure its accuracy. The AI’s expected outputs were compared to actual predictions, resulting in 90% accuracy—with 18 correctly classified shots and 2 misclassified ones. The misclassifications were primarily due to lighting variations and camera angles, where overexposure (bright sunlight) and low-light conditions affected the AI’s ability to track the ball trajectory accurately. These issues suggest that the model was trained on a dataset that lacked diverse lighting and background conditions, impacting its performance in extreme scenarios. Future improvements will focus on expanding the training dataset and fine-tuning detection thresholds to enhance the model’s robustness and accuracy. By offering a cost-effective, accessible, and practical solution for basketball training, ShotTrainer bridges the gap between AI and real-world sports analytics, making it an essential tool for players, coaches, and enthusiasts aiming to enhance their shooting performance.

Keywords

AI-based Training, Basketball Performance Analytics, YOLO Object Detection, Shot Accuracy Tracking

A Smart Drawing Platform for Seniors to Enhance Mental Well-being and Cognitive Function using Machine Learning and Artificial Intelligence

Michelle Chen¹, Laurie Delinois² , ¹Weston High School, 444 Wellesley St, Weston, MA 02493 , ²Computer Science Department, California State Polytechnic University, Pomona, CA 91768

ABSTRACT

Both Parkinson’s disease and Alzheimer’s disease have become more globally prevalent as the years pass. This project aims to mitigate/slow down their effects by providing a guided way for those afflicted to practice their motor function and cognitive abilities through a drawing app that integrates AI-generated prompts for users to work with. Challenges included image clarity when processing both 2D and 3D images to create line art with and ensuring that AI prompts were not overly repetitive so as not to frustrate users. It was also a challenge to ensure that aesthetic issues such as lack of brush types and UI would not deter those that the application is meant for by causing frustration and disinterest. These issues were fixed through experimentation and testing to determine the best values for each component of the program. For example, the experimentation done for the image processing scene offered insight as to the best edge depth value for high clarity in processed 3D images. Through our second experiment we found that it is also important how specific we are with the prompt given to the chat-gpt API, as a lack of specifics can cause redundancy in the prompts given to users. Ultimately, this application will provide those with AD and PD an opportunity to experience a form of art therapy in a calm, individual environment where they can go at their own pace in maintaining their motor skills, providing an alternative approach to more traditional therapy routes.

Keywords

Art therapy, AI-generated prompts, cognitive training, motor function rehabilitation

PREDICTIVE POLICING AS A THREAT TO JUSTICE – WHY ALGORITHMS SHOULD SERVE COMMUNITIES, NOT CONTROL THEM

Theodora-Stavroula Korma Department of Communication and Information studies, Rijkuniversiteit Groningen, Groningen, The Netherlands

ABSTRACT

Predictive policing, an algorithm-driven crime prevention initiative, claims to render the criminal justice system more effective and neutral. Yet, this essay argues that these algorithmic models reinforce system-level prejudices and unfairly focus on over marginalized populations while amplifying injustice.As these models draw from historical data covering four decades shaped by biased police operations, they can magnify racial profiling and harden social hierarchies. Furthermore,Furthermore, these systems lack of transparency and accountability has ethical consequences on surveillance, due process, and civil rights violations. In line with Design Justice principles, this paper calls for a redesign of predictive policing that is not about control by systems but the empowerment of communities. Instead of being used as enforcement tools, these algorithms must be redesigned to address root causes of social harm, promote equitable resource allocation, and engage communities in decision-making. Through participatory governance and moral algorithmic design, predictive technologies can serve justice rather than subvert it, so that communities are protected, not monitored.

Keywords

Predictive policing, algorithmic bias, systemic injustice, racial profiling, Design Justice.

Information Retrieval vs Cache Augmented Generation vs Fine Tuning: A Comparative Study on Urdu Medical Question Answering

Ahmad Mahmood¹, Zainab Ahmad,¹ Iqra Ameer², and Grigori Sidorov¹ , ¹Instituto Politecnico Nacional (IPN), Centro de Investigación en Computacion(CIC), Mexico City, Mexico ,²Division of Science and Engineering, The Pennsylvania State University, Abington, PA, USA

ABSTRACT

The development of medical question-answering (QA) systems has predominantly focused on high-resource languages, leaving a significant gap for low-resource languages like Urdu. This study proposed a novel corpus designed to advance medical QA research in Urdu, created by translating the benchmark MedQuAD corpus into Urdu using the Generative AI-based translation technique. The proposed corpus is evaluated using three approaches: (i) Information Retrieval (IR), (ii) Cache-Augmented Generation (CAG), and (iii) Fine-Tuning (FT). We conducted two experiments, one on a 500-instance subset and another on the complete 3,152-question corpus, to assess retrieval effectiveness, response accuracy, and computational efficiency. Our results show that JinaAI embeddings outperformed other IR models, while OpenAI 4o mini, FT achieved the highest response accuracy (BERTScore: 70.6%) but is computationally expensive. CAG eliminates retrieval latency but requires high resources. Findings suggest that IR is optimal for real-time QA, Fine-Tuning ensures accuracy, and CAG balances both. This research advances Urdu medical AI, bridging healthcare accessibility gaps

Keywords

Information retrieval, retrieval-augmented generation, cache-augmented generation, fine-tuning, Urdu medical question-answering

Reading Design-oriented Sociology in the Design Work of a Trustworthy Governable Platform for Institutional Communication in Relational Enterprises: Affordances in Action Up to User Needs

Gianni Jacucci, Department of Information Engineering and C.S., University of Trento, Italy

ABSTRACT

This focused and dedicated review essay examines the architectural design work of a Trustworthy Governable Platform (TGP), that redefines the information paradigm for institutional communication platforms—particularly in relational enterprises—where complex social dynamics are at play. Examples include addressing accountability and ensuring contextual integrity. We adopt a design-oriented sociological perspective to uncover how the platforms creator ensured that its social use and meaning align with the specific situational needs from the outset. The study begins by examining the key features of social interactions that characterise relational enterprises, as identified and articulated by the author of the design process through selected theoretical approaches. Next, we analyse the communicative affordances of technology in action that are necessary to realise these social interaction features. Finally, we explore the design elements of the supporting information and communication infrastructure that the author deemed essential to enable the appropriate options for action. These elements ensure users experience the intended social usage and meaning of the technology in the context of relational enterprises. The purpose of the essay is to stimulate discussion on what we deem a most promising enterprise. TGP has been under active technical development for a number of years and, at the time of writing, is undergoing initial trials for deployment and appropriation.

Keywords

Relational Enterprises, Communications on Information, Entrepreneurial, Management & Organisation Cybernetics, Socio-cybernetics, Complex Adaptive System and Theory.

Humanizing AI: A Human-centered Architecture to Developing Trustworthy Intelligent Systems

Muhammad Uzair Akmal¹, Selvine George Mathias¹, Saara Asif¹, Leonid Koval¹, Simon Knollmeyer¹, and Daniel Grossmann², ¹AImotion Bavaria Technische Hochschule Ingolstadt, Germany, ²Faculty of Computer Science and Data Processing Technische Hochschule Ingolstadt, Germany

ABSTRACT

The lack of trust and fairness in artificial intelligence (AI) systems driven by biases, misclassified data, lack of transparency, and limited interoperability, raises significant ethical concerns and socioeconomic impacts. This study presents a reference architecture for an AI pipeline aligned with Industry 5.0 principles, focusing on human-centered design, sustainability, social responsibility, and resilience. It enhances human-AI collaboration by involving four user types (data scientists, domain experts, organizations, and end users) who share decision-making responsibilities during the AI system development process. The architecture incorporates Active Learning (AL) to address data bias and misclassification issues and Transfer Learning (TL) to ensure model reusability in resource-constrained environments. Post-modeling Explainability gives stakeholders insight into model behavior and outcomes, fostering transparency and trust. Additionally, two user-ranked custom validation metrics evaluate the architecture and calculate Mean Average Precision (MAP) for Rankings. These metrics ensure the architecture design and outcomes adhere to ethical AI principles while promoting collaborative, responsible, and sustainable AI development.

Keywords

Artificial intelligence, Human-centric AI, Active learning, Transfer learning, Explainable AI, Intelligent systems, Industry 5.0.

Balancing Privacy and Innovation a Vae Framework for Synthetic Healthcare Data Generation

Saritha Kondapally, Senior Member IEEE

ABSTRACT

The growing reliance on data-driven innovation in healthcare often collides with the critical need to protect patient privacy, creating a tension between progress and compliance. This study bridges that gap by introducing a Variational Autoencoder (VAE)-based framework to generate synthetic healthcare data that mirrors real-world datasets while ensuring privacy preservation. By leveraging synthetic EHRs created using the Synthea tool, the framework achieves a balance between statistical fidelity and data utility, enabling secure sharing and collaboration without compromising sensitive information. Through rigorous evaluation of distributional alignment and predictive performance, this work demonstrates the promise of synthetic data in unlocking the full potential of AI-driven healthcare solutions, offering a path to innovation that respects both privacy and progress.

Keywords

Privacy-Preserving Data Generation, Variational Autoencoders (VAEs), Synthetic Healthcare Data, Generative AI in Healthcare, HIPAA & GDPR Compliance, Electronic Health Records (EHRs), Data Privacy and Utility Trade-off, Machine Learning for Healthcare, AI, Federated Learning & Differential Privacy, Data Sharing & Secure Collaboration, Feature Engineering, FHIR Standard for Interoperability

Spiking Neural Networks and Artificial Neural Networks for Intrusion Detection Systems

Leonard Knapp¹ Sven Nitzsche¹, Matthias B¨orsig¹, Alexandru asilache¹,Ingmar Baumgart¹, and J¨urgen Becker², ¹FZI Research Center for Information Technology, Karlsruhe, Germany, ²Karlsruhe Institute of Technology, Karlsruhe, Germany

ABSTRACT

In computer networks, protection against potential threats is paramount, requiring robust security measures. However, traditional rule-based Intrusion Detection Systems (IDSs) often fail to adapt to dynamic environments, prompting the exploration of innovative solutions such as Neural Network (NN)- based approaches. This research explores the efficacy of Spiking Neural Networks (SNNs) as the sole data processor in IDSs, which differentiates this approach from previous work. Through extensive experimentation on the NSL-KDD, CIC-IDS-2017, CIC-IOT-2023, and AWID3 datasets, various SNN configurations were examined alongside conventional Artificial Neuronal Networks (ANNs). The results highlight the promising performance of SNNs, which achieved remarkable accuracies of up to 99.22% on these datasets using rate and density encoding. Furthermore, a comparative analysis reveals the competitive advantage of SNNs over their ANN counterparts in generating fewer false positives at equivalent accuracy, emphasizing their adaptability to time-dependent data. This study thoroughly evaluates the achievable accuracy of an IDS built from spiking- and artificial neurons within a feed-forward fully connected topology. For the spiking neurons, the Leaky Integrate-and-Fire (LIF) model is selected. The results obtained by this approach support a paradigm shift towards SNN-based IDSs to strengthen network security, although further research is essential to ensure broader applicability and scalability.

Keywords

Spiking Neural Network, Artificial Neural Network, Intrusion Detection System, Computer Network.

Dᴇᴇᴘ ʟᴇᴀʀɴɪɴɢ - Eɴʜᴀɴᴄᴇᴅ ᴅʀᴜɴᴋ ᴅᴇᴛᴇᴄᴛɪᴏɴ Iɴ Vɪᴇᴛɴᴀᴍ ᴜsɪɴɢ ᴛʜᴇʀᴍᴀʟ Iᴍᴀɢɪɴɢ ᴀɴᴅ ᴀᴅᴠᴀɴᴄᴇᴅ Ɪᴍᴀɢᴇ ᴘʀᴏᴄᴇssɪɴɢ

Ngo Dinh Luan, Nguyen Van Thanh Thong, Hoang Ngoc Dung, FPT University, Vietnam

ABSTRACT

Alcohol intoxication is a major cause of traffic accidents and a potential threat to public safety, especially in Vietnam. Traditional detection methods, such as breathalyzers or blood tests, are invasive to privacy, time-consuming, and require the subjects cooperation. This study provides an innovative, non-invasive method for detecting alcohol-induced cognitive impairment on the street using thermal imaging technology. Our method provides real-time drunkenness recognition by analyzing specific facial temperature variations, features, and physiological patterns using deep learning such as convolutional neural network (CNN) models. Experimental evaluations demonstrate the advantages of this method, providing a non-contact, real-time solution that is important for law enforcement, healthcare, and traffic safety.

Keywords

Drunk identification, Convolutional Neural Network, Thermal image, Image Processing.

Attribute Extraction for E-Commerce Fashion Data

Apurva Sinha and Ekta Gujral, Walmart Global Tech Sunnyvale, USA

ABSTRACT

Product attribute extraction is a growing field in e-commerce business, with several applications including product ranking, product recommendation, future assortment planning and improving online shopping customer experiences. Understanding the customer needs is critical part of online business, specifically fashion products. Retailers use assortment planning to determine the mix of products to offer in each store and channel, stay responsive to market dynamics and to manage inventory and catalogs. The goal is to offer the right styles, in the right sizes and colors, through the right channels to fostering customer loyalty. In this paper we present PAE, a product attribute extraction algorithm for future trend reports consisting text and images in PDF format. Most existing methods focus on attribute extraction from titles or product descriptions or utilize visual information from existing product images. Compared to the prior works, our work focuses on attribute extraction from PDF files where upcoming fashion trends are explained. Our contributions are three-fold: (a) We develop PAE, an efficient framework to extract attributes from unstructured data (text and images); (b) We provide catalog matching methodology based on BERT representations to discover the existing attributes using upcoming attribute values; (c) We conduct extensive experiments with several baselines and show that PAE is an effective, flexible and on par or superior (avg 92.5% F1-Score) framework to existing state-of-the-art for attribute value extraction task.

Keywords

Attribute Extraction, PDF files, Large Language Model (LLM), Text and Images, BERT embeddings

Emotion Predictions of Sentiment During Covid Pandemic using Intelligent Chatbots

Venkata Duvvuri, Chetan Kulkarni, Sritha Gogineni

ABSTRACT

COVID-19 pandemic has created a major impact around the world. Governments and businesses small or big around the world are facing unprecedent decisions to either close up or reopen or drive other policies based on the sentiment of people. While, understanding this sentiment and accompanying emotions has been researched especially in social media channels like Twitter, we propose a novel way to capture sentiment and emotions using intelligent chatbots (EmoBot) that reduces the participants biases inherent in prior analysis. We devise Emotion Extraction Layers (EEL) based on latest deep learning techniques like BERT (Bidirectional Encoder Representations from Transformers) and compare these models with traditional machine learning models. We show for a variety of emotions that the new deep learning models predict 1-5% (Sad, Fearful & Angry) better than traditional machine learning techniques. Further, we showcase that leveraging retail sentiment data using transfer learning techniques can help cross the cold start chasm of having no chatbot data initially, and this technique achieves -8% closer in performance when compared to having enough COVID sentiment data.

Keywords

COVID, sentiment analysis, chatbots, BERT, deep learning, transfer learning.

Optimizing Energy Forecasting in Buildings: Comparative Analysis of Machine Learning Algorithms

Shafaq Khan, Anirudh Grack, Saksham Thukral, Kausar Fatema, Aryan Batra, Jaykumar M Kadiwala, School of Computer Science, University of Windsor, Windsor, ON

ABSTRACT

Predicting energy consumption in buildings based on factors such as size, design, usage patterns, and weather presents significant challenges. Comprehensive analysis of this research focuses on developing data-driven models for early-phase energy forecasting in buildings. This initiative is pivotal for energy- efficient building design and contributes significantly to energy planning, management, and conservation efforts. Utilizing data from 1500 buildings across various categories, the study integrates data preprocessing, mining, and analysis techniques along with machine learning algorithms such as Linear Regression, Decision Trees (DT) , Extreme Gradient Boosting (XGBoost), and Light Gradient-Boosting Machine (LightGBM). Unique to the approach is the explicit incorporation of advanced feature engineering techniques and weather data into the modeling process, filling critical gaps in existing methodologies. This study demonstrates that LightGBM outperforms alternative models and ensures the designed model is adaptable across various meter types, enhancing its universal application. Aligned with the United Nations (UN) 2030 The research agenda is dedicated to advancing global environmental sustainability by significantly reducing CO2 emissions and fostering substantial investments in building energy efficiency.

Keywords

Forecasting, Energy consumption, Light-GBM, XGBoost, Decision Trees, Linear Regression, Load prediction.

The Role of Artificial Intelligence in Enhancing Diagnostic Accuracy and Patient Outcomes in Healthcare

Anurag Sing1and Sheenu Rizvi, Department Of_Computer_Science & Engineering, Amity University,India

ABSTRACT

Artificial Intelligence (AI) hasemerged as a transformativeforce in medical, providing innovative ideas that enhance diagnostic accuracy and better patient outcomes. The incorporation of A.I. technologies like machinelearning(ML), deep learning(DL), and natural languageprocessing(NLP) into diagnostic processes holds the potential to revolutionize medical practice. By rapidly analyzinglarge datasets, recognizing complex patterns, and facilitating data-driven decisions, AI is aiding healthcare professionals in reducing diagnostic errors and delivering personalized treatments. This paper provides an in-depth analysis of AIs impact on diagnostic processes in healthcare, real-world applications, benefits, challenges, ethical considerations, and future directions for AI technologies. The paper also explores how AI-enabled systems are reshaping healthcare workflows and empowering healthcare providers to make more informed decisions, ultimately leading to better patient outcomes.

Keywords

Artificial Intelligence, Healthcare, Diagnostic Accuracy, Machine Learning, Deep Learning, Personalized Medicine, Patient Outcomes, Ethical Considerations.

A Creative System to Generate and Play Music and Lyrics using Flutter and AI Technologies

Yuting Gao¹, Ang Li², ¹Crean Lutheran High School, 12500 Sand Canyon Ave, Irvine, CA 92618, ²Computer Science Department, California State University, Long Beach, CA 90840

ABSTRACT

MuseComposer is a user-friendly app for generating music and lyrics using AI [1]. It simplifies music creation by offering intuitive interfaces and personalized outputs based on user prompts. Experiments revealed high satisfaction with ease of use and melody generation but identified improvements needed in lyrics accuracy [2]. The apps integration of AI, a robust database, and dynamic playback ensures accessibility and creativity. With future refinements, MuseComposer can redefine music composition for users of all skill levels [9].

Keywords

AI-powered music generation, User-friendly interface, Personalized melodies, Creative lyric generation, Music composition accessibility.

Machine Learning for Depression Prediction: a Comparative Study of Ensemble Models

Shakthi I. Weerawansa, Uthayasanker Thayasivam, Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka

ABSTRACT

Depression is a major global health concern, and early identification is crucial for effective intervention and therapy. This study explores machine learning techniques to predict depression risk using a dataset from a mental health survey. Three models—XGBoost, Random Forest, and Logistic Regression—were evaluated based on accuracy, precision, recall, and F1-score. XGBoost achieved the highest accuracy (93.51%) and recall (80%) for the minority class, demonstrating its superiority in handling imbalanced datasets. Our findings revealed that financial stress, work pressure, and sleep duration were the most influential predictors. These findings highlight the potential of machine learning in developing automated mental health screening tools.