Brief Bio

Hi, I am Xiao Luo. I am currently a first year CSE PhD student at THE Ohio State University, and luckily to be advised by Prof. John Paparrizos. My current research is focused on Similarity Search, especially on Vector Database and low-level system optimizatin for database.
Please feel free to contact me at luo[DOT]1632[AT]osu[DOT]edu

Education

Doctor of Philosophy, Computer Science and Engineering, 2024 - present
THE Ohio State University
Master of Science, Electrical and Computer Engineering, 2022 - 2024

Georgia Institute of Technology, Atlanta, United States
Bachelor of Engineering, Software Engineering, 2018 - 2022,

Sichuan University, Chengdu, China

Award

I am in the 5th place in 2023ACM/IEEE TinyML Design Contest at ICCAD
I completed features engineering, and neuron network architecture search, and implemented float16 quantization on devices by hand to save memory.
Code

Skill

Kubernetes, containerd, golang, and cgroupv2:

For container orchestration, task scheduling, resource metrics, and allocation
Programming Techniques

C/C++, Golang, Python, CUDA(nvSHMEM), Serial Data Processing
Machine Learning

Pytorch and Tensorflow, Transformer and some CV models, DQN with some variances
Profiling and analyzing the latency distribution, and resource consumption (computing, memory, I/O, etc.) for each part of a big model for efficient learning.
Huggingface and working on efficient attention in LLM models.
Statistical Machine Learning like SVM, Random Forest, Decision Tree, etc.
Linux

Linux commands, Unix sockets, IPC, System Setting&Matainence, etc.

Research Experience

Improving the Efficiency of Transformer Models in Mobile Applications

Efficient And Intelligent Computing Lab - Ongoing

Advisor: Prof. Yingyan (Celine) Lin

Explore new efficient attention mechanisms for fine-tuning or inference in LLM.
Use Nvidia cutlass to implement efficient operators for different kinds of linear, conv, and activation function operations. Implement operator fusion to avoid unnecessary activation reloading.
Extensive model analysis, profiling of a hardware-contextual understanding of the transformer-based models.
Perform model quantization on LLM and diffusion models
Using compressing techniques and respecting hardware characteristics to make models more lightweight and can run on mobile devices.

Reinforcement Learning Driven Resource Management and ML Task Offloading in Cloud-Edge Computing

PengCheng National Laboratory

Research Intern 10/2022 ~ 06/2023

Advisor: Dr. Wen Wu

Build the whole system.
Use kubernetes, and golang to build a distributed system with task scheduling and resource allocation (virtualized CPU&GPU, computing and memory resources).
Use Reinforcement Learning to make resource allocation, task scheduling, and container creation decisions.
Co-schedulle Multiple Object Tracking Task, ORB-SLAM Task and Feature Fusion in an Autonomous Driving Scenario.
CPU, Memory, and time-slicing GPU Resource Metrics & Virtulization.
Application Execution Time Prediction for RL schedulling.

My cluster’s Picture

System Architecture

Github: Task Runner | Scheduler | Devices Simulator | Kubernetes Config

Application Profiling in Big Datacenters

Advisor: Dr. Min Luo

Read and Conclude over 50 papers about resource management in big data centers, and conclude their application profiling techniques (profile applications’ characteristic and predict their resource consumption).
Survey : Application Profiling in Big Datacenters

Internship Experience

Tencent (Shen Zhen, China)

Intern at Backend Software Engineer 08/2021 ~ 10/2021

Use Golang, MySQL, and monitoring platforms to develop new features and functions of the system
Committed over 5000 lines of code

Projects

Network Virtualization Resource Management Platform (NVRMP)
This is a computation resource management framework based on Kubernetes for Autonomous Driving. It can receive real-time images stream from vehicles, distribute them to appropriate machines through wireless or wired network. It uses Reinforcement Learning to decide task placement and resource allocation. Tasks include object detection, tracking, SLAM, fusion and path planning. And CPU, GPU and memory resources are virtualized through cgroups, time-sliced GPU virtulization.

Code
Chinese Invention Patent
Kubernetes/Containerd Resource Virtualization Golang/C Rasberry Pi/Nvidia TX2 Pytorch

This demo showcases an Azure Cloud Digital Twin (DT) platform for monitoring environmental temperatures. Utilizing Raspberry Pi equipped with a sensor, it captures temperature data and uploads it to the Azure Cloud DT platform. On this cloud-based DT visual interface, users can view temperature readings from various devices and understand their network topology. Additionally, the platform allows for monitoring device connection statuses, downloading device log files, and sending specific commands to targeted devices.

Code
Azure Cloud Digital Twin C# Rasberry Pi

This project focuses on designing efficient systems for Machine Learning and applying these systems to solve real-world problems. It achieved 5th place in the TinyML Design Contest at ICCAD, where Network Architecture Search (NAS) was utilized to create a compact neural network for heart stroke detection. This project also involved implementing a neural network and its float16 quantization in C language, significantly optimizing memory usage for efficient inference on microcontrollers. Additionally, Binary Neural Networks (BNNs) were explored to further reduce memory requirements and enhance the efficiency of inference on microcontrollers.

Code
5th at ICCAD TinyML Contest
MCU C/Tensorflow ML Quantization NAS

This project is a data visualization tool based on a simple custom language. It imports data from Excel, network sockets, or other custom interfaces. In this GUI tool, all data interfaces and charts are abstracted as draggable items. Users can connect data sources to charts, such as line, bar, or area graphs, by linking items, enabling real-time data reading and chart rendering. These connections are underpinned by a custom programming language, allowing users to add bespoke data interfaces, like fetching data from HTTP endpoints, through automatically generated code.