Computer_science on Xi's Blog

Kernel comparison with a MMA in CUDA and near-SOTA/cuBLAS performance kernel

Sun, 16 Nov 2025 00:06:00 +0000

Kernel comparison with a MMA in CUDA and near-SOTA/cuBLAS performance kernel

The project is hosted in the repository: CUDA-refresh

Introduction

The kernel is the “kernel” in the concept of CUDA, it directly influence the compute efficiency and it’s the key to take advanage of GPU’s huge amount of computation resource and bandwidth.

Here is a simple refresh of the CUDA calculation and memory hierachy and their infleunce to the computation efficiency.

Add array support for DSL on Minimal CPU

Fri, 19 Sep 2025 00:06:00 +0000

The project is hosted in the repository (section-4): minimal_CPU

Introduction

Currently, the compiler and DSL support the basic calculation like assigning value, reset value, add and substrction operation. However, there is an important part in the language not being supported. It’s array. The purpose of this array is to read and write string with more convenience, and it can be used to construct a mini-terminal or shell to interact with the simulated hardware(CPU).

DSL and Compiler Based on Minimal CPU

Mon, 14 Jul 2025 00:06:00 +0000

The project is hosted in the repository (section-2): minimal_CPU

Introduction

After implementing our minimal CPU, we can write machine code or assembly code to run programs. However, these low-level languages are not easy to read or maintain. We need a way to construct a high-level language that can be translated to machine code, making program development more accessible.

Thus, the purpose of this section is to design a Domain-Specific Language (DSL) and its compiler with a complete build system.

Learn from MVP: Minimal Instruction Set CPU

Fri, 30 May 2025 00:05:00 +0000

Introduction

The 6502 CPU program has been a great inspiration for understanding the foundations of computer science. It’s fascinating how basic boolean functions and transistors can form such a complex and beautiful system. However, even the 6502 CPU, with its 150+ instructions, can be overwhelming for those trying to understand the fundamental principles of computing.

The Importance of Minimal Viable Products

When learning complex systems, it’s crucial to start with a minimal viable product (MVP) - understanding the most essential components that make a program run. This approach led me to explore foundational theories and historical concepts in computing.

Running LLM on mac mini clusters, strategy and practice

Fri, 28 Feb 2025 00:05:00 +0000

Running LLM on mac mini clusters, strategy and practice

Data Parallel

This is the most straightforward strategy, typically used when batch size > 1. It increases throughput by giving the system more data to process simultaneously.

Pipeline Parallel

Due to VRAM or unified memory limitations in Mac minis, loading the entire model into memory isn’t always possible. Pipeline parallel is an effective strategy to reduce memory usage.

The approach splits the model into several parts, loading them into memory sequentially. When running, it operates like a factory pipeline - data flows through the system as different model parts process it in sequence.

Set up remote access for my backup desktop to serve as remote development machine

Sat, 07 Sep 2024 00:05:00 +0000

Background

After configuring my new desktop, the previous one was suspended because of the electricity concern. But my friend told me his apartment was electric-free, after discussing with him I decided to move my previous desktop to his living room to set up a remote server to reduce the expence of cloud servers.

Steps

Step 1: clean the computer

This step was quiet simple, just download ubuntu 24.02 LTS image and install in the NVME SSD.