数字创新中心

Center for Digital Innovation

GenComUI: Exploring Generative Visual Aids as Medium to Support Task-Oriented Human-Robot Communication

This work investigates the integration of generative visual aids in human-robot task communication. We developed GenComUI, a system powered by large language models (LLMs) that dynamically generates contextual visual aids—such as map annotations, path indicators, and animations—to support verbal task communication and facilitate the generation of customized task programs for the robot. This system was informed by a formative study that examined how humans use external visual tools to assist verbal communication in spatial tasks. To evaluate its effectiveness, we conducted a user experiment (n = 20) comparing GenComUI with a voice-only baseline. The results demonstrate that generative visual aids, through both qualitative and quantitative analysis, enhance verbal task communication by providing continuous visual feedback, thus promoting natural and effective human-robot communication. Additionally, the study offers a set of design implications, emphasizing how dynamically generated visual aids can serve as an effective communication medium in human-robot interaction. These findings underscore the potential of generative visual aids to inform the design of more intuitive and effective human-robot communication, particularly for complex communication scenarios in human-robot interaction and LLM-based end-user development.[......]

继续阅读

Walk in Their Shoes to Navigate Your Own Path: Learning About Procrastination Through A Serious Game

Procrastination, the voluntary delay of tasks despite potential negative consequences, has prompted numerous time and task management interventions in the HCI community. While these interventions have shown promise in addressing specific behaviors, psychological theories suggest that learning about procrastination itself may help individuals develop their own coping strategies and build mental resilience. However, little research has explored how to support this learning process through HCI approaches. We present ProcrastiMate, a text adventure game where players learn about procrastination’s causes and experiment with coping strategies by guiding in-game characters in managing relatable scenarios. Our field study with 27 participants revealed that ProcrastiMate facilitated learning and self-reflection while maintaining psychological distance, motivating players to integrate newly acquired knowledge in daily life. This paper contributes empirical insights on leveraging serious games to facilitate learning about procrastination and offers design implications for addressing psychological challenges through HCI approaches.[......]

继续阅读

Align with Me, Not TO Me: How People Perceive Concept Alignment with LLM-Powered Conversational Agents

Concept alignment—building a shared understanding of concepts—is essential for human and human-agent communication. While large language models (LLMs) promise human-like dialogue capabilities for conversational agents, the lack of studies to understand people’s perceptions and expectations of concept alignment hinders the design of effective LLM agents. This paper presents results from two lab studies with human-human and human-agent pairs using a concept alignment task. Quantitative and qualitative analysis reveals and contextualizes potentially (un)helpful dialogue behaviors, how people perceived and adapted to the agent, as well as their preconceptions and expectations. Through this work, we demonstrate the co-adaptive and collaborative nature of concept alignment and identify potential design factors and their trade-offs, sketching the design space of concept alignment dialogues. We conclude by calling for designerly endeavors on understanding concept alignment with LLMs in context, as well as technical efforts to combine theory-informed and LLM-driven approaches. [......]

继续阅读

Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning

An increasingly massive number of remote-sensing images spurs the development of extensible object detectors that can detect objects beyond training categories without costly collecting new labeled data. In this paper, we aim to develop open-vocabulary object detection (OVD) technique in aerial images that scales up object vocabulary size beyond training data. The performance of OVD greatly relies on the quality of class-agnostic region proposals and pseudo-labetls for novel object categories. To simultaneously generate high-quality proposals and pseudo-labels, we propose CastDet, a CLIP-activated student-teacher open-vocabulary object Detection framework. Our end-to-end framework following the student-teacher self-learning mechanism employs the RemoteCLIP model as an extra omniscient teacher with rich knowledge. By doing so, our approach boosts not only novel object proposals but also classification. Furthermore, we devise a dynamic label queue strategy to maintain high-quality pseudo labels during batch training. We conduct extensive experiments on multiple existing aerial object detection datasets, which are set up for the OVD task. Experimental results demonstrate our CastDet achieving superior open-vocabulary detection performance, e.g., reaching 46.5% mAP on VisDroneZSD novel categories, which outperforms the state-of-the-art open-vocabulary detectors by 21.0% mAP. To our best knowledge, this is the first work to apply and develop the open-vocabulary object detection technique for aerial images. The code is available at https://github.com/lizzy8587/CastDet.[......]

继续阅读

Cocobo: Exploring Large Language Models as the Engine for End-User Robot Programming

End-user development allows everyday users to tailor service robots or applications to their needs. One user-friendly approach is natural language programming. However, it encounters challenges such as an expansive user expression space and limited support for debugging and editing, which restrict its application in end-user programming. The emergence of large language models (LLMs) offers promising avenues for the translation and interpretation between human language instructions and the code executed by robots, but their application in end-user programming systems requires further study. We introduce Cocobo, a natural language programming system with interactive diagrams powered by LLMs. Cocobo employs LLMs to understand users’ authoring intentions, generate and explain robot programs, and facilitate the conversion between executable code and flowchart representations. Our user study shows that Cocobo has a low learning curve, enabling even users with zero coding experience to customize robot programs successfully. [......]

继续阅读

MRehab: A Mixed Reality Rehabilitation System Supporting Integrated Speech and Hand Training

Integrated speech and hand-motor training is an effective post-stroke rehabilitation method. However, few interactive systems and assistive technologies were developed in this field. Driven by this challenge, we leverage Mixed Reality technology, which merges immersive virtual scenarios with physical hands-on tools in the real world, to provide patients with multi-modal interactions and engaging training experiences. Following a user-centered design approach, we first interviewed seven therapists to identify user requirements and design considerations. We further designed MRehab, an interactive rehabilitation system that allows patients to regain speech and hand skills through MR scenarios that depict daily living activities. We conducted a preliminary user test with 12 patients and 5 therapists to validate the feasibility and understand the user experience with MRehab. The results confirmed its feasibility for hand-motor training. Additionally, the patients expressed high motivation, engagement, and a positive attitude toward using MRehab. Our findings demonstrate the potential of MR technology in integrated speech and hand function rehabilitation training.[......]

继续阅读

SCANeXt: Enhancing 3D medical image segmentation with dual attention network and depth-wise convolution

Existing approaches to 3D medical image segmentation can be generally categorized into convolution-based or transformer-based methods. While convolutional neural networks (CNNs) demonstrate proficiency in extracting local features, they encounter challenges in capturing global representations. In contrast, the consecutive self-attention modules present in vision transformers excel at capturing long-range dependencies and achieving an expanded receptive field. In this paper, we propose a novel approach, termed SCANeXt, for 3D medical image segmentation. Our method combines the strengths of dual attention (Spatial and Channel Attention) and ConvNeXt to enhance representation learning for 3D medical images. In particular, we propose a novel self-attention mechanism crafted to encompass spatial and channel relationships throughout the entire feature dimension. To further extract multiscale features, we introduce a depth-wise convolution block inspired by ConvNeXt after the dual attention block. Extensive evaluations on three benchmark datasets, namely Synapse, BraTS, and ACDC, demonstrate the effectiveness of our proposed method in terms of accuracy. Our SCANeXt model achieves a state-of-the-art result with a Dice Similarity Score of 95.18% on the ACDC dataset, significantly outperforming current methods.[......]

继续阅读

EmTex: Prototyping Textile-Based Interfaces through An Embroidered Construction Kit

As electronic textiles have become more advanced in sensing, actuating, and manufacturing, incorporating smartness into fabrics has become of special interest to ubiquitous computing and interaction researchers and designers. However, innovating smart textile interfaces for numerous input and output modalities usually requires expert-level knowledge of specific materials, fabrication, and protocols. This paper presents EmTex, a construction kit based on embroidered textiles, patterned with dedicated sensing, actuating, and connecting components to facilitate the design and prototyping of smart textile interfaces. With machine embroidery, EmTex is compatible with a wide range of threads and underlay fabrics, proficient in various stitches to control the electric parameters, and capable of integrating versatile and reliable interaction functionalities with aesthetic patterns and precise designs. EmTex consists of 28 textile-based sensors, actuators, connectors, and displays, presented with standardized visual and tactile effects. Along with a visual programming tool, EmTex enables the prototyping of everyday textile interfaces for diverse life-living scenarios, that embody their touch input, and visual and haptic output properties. With EmTex, we conducted a workshop and invited 25 designers and makers to create freeform textile interfaces. Our findings revealed that EmTex helped the participants explore novel interaction opportunities with various smart textile prototypes. We also identified challenges EmTex shall face for practical use in promoting the design innovation of smart textiles.[......]

继续阅读

LayTex: A Design Tool for Generating Customized Textile Sensor Layouts in Wearable Computing

Smart textile sensors have attracted increasing interest in the domain of wearable computing for human motion monitoring. Previous studies have shown that textile sensor layout has a major impact on the effectiveness and performance of wearable prototypes. However, it is still a trick and time-consuming issue to determine textile sensor layout in a quantitative approach as it involves figuring out the number, placement, and even orientations of sensors, yet there is no streamlined digital platform or tool specifically addressing this issue. In this paper, we introduce LayTex, a digital tool capable of generating layout proposals for personalized scenarios, which aims at facilitating designers and researchers to construct prototypes efficiently. The preliminary evaluation with designers on smart garments for scoliosis indicates that LayTex has great potential to lower the barriers and simplify the process of textile prototype construction.[......]

继续阅读

A Tailored Textile Sensor-based Wrap for Shoulder Complex Angles Monitoring

The shoulder joint plays a crucial role in the recovery of upper limb function. However, conventional wearable technologies employed for monitoring shoulder joint movements predominantly rely on inertial sensing units (IMUs), which may suffer alignment errors and compromise the freedom and wearability experienced by patients during their daily activities. This paper contributes in two facets, first, it presents the design, implementation, and technical evaluation of a new wearable system, a customized unilateral shoulder wrap that utilizes flexible and breathable textile sensors. Diverging from earlier studies, our system not only facilitates the monitoring of glenohumeral joint angles but also concurrently tracks the movement angles of the scapula. Secondly, to estimate joint angles, we propose a specific model called the Channel-Temporal Encoding Network (CTEN), which leverages Transformer and Long Short-Term Memory (LSTM) architectures. In a preliminary technical evaluation, the results demonstrate root mean square errors (RMSEs) of 2.24°and 1.13°for the glenohumeral joint and scapula, respectively. This study is intended to contribute to the development of more advanced wearables tailored for shoulder joint rehabilitation training.[......]

继续阅读