<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Computer_science on Xi's Blog</title><link>https://xichen1997.github.io/categories/computer_science/</link><description>Recent content in Computer_science on Xi's Blog</description><generator>Hugo -- 0.154.5</generator><language>en-us</language><lastBuildDate>Sun, 16 Nov 2025 00:06:00 +0000</lastBuildDate><atom:link href="https://xichen1997.github.io/categories/computer_science/index.xml" rel="self" type="application/rss+xml"/><item><title>Kernel comparison with a MMA in CUDA and near-SOTA/cuBLAS performance kernel</title><link>https://xichen1997.github.io/posts/2025-11-17-introduction-to-cuda-and-high-performance-cuda-kernels/</link><pubDate>Sun, 16 Nov 2025 00:06:00 +0000</pubDate><guid>https://xichen1997.github.io/posts/2025-11-17-introduction-to-cuda-and-high-performance-cuda-kernels/</guid><description>&lt;h1 id="kernel-comparison-with-a-mma-in-cuda-and-near-sotacublas-performance-kernel"&gt;Kernel comparison with a MMA in CUDA and near-SOTA/cuBLAS performance kernel&lt;/h1&gt;
&lt;p&gt;The project is hosted in the repository:
&lt;a href="https://github.com/xichen1997/CUDA-refresh"&gt;CUDA-refresh&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The kernel is the &amp;ldquo;kernel&amp;rdquo; in the concept of CUDA, it directly influence the compute efficiency and it&amp;rsquo;s the key to take advanage of GPU&amp;rsquo;s huge amount of computation resource and bandwidth.&lt;/p&gt;
&lt;p&gt;Here is a simple refresh of the CUDA calculation and memory hierachy and their infleunce to the computation efficiency.&lt;/p&gt;</description></item><item><title>Add array support for DSL on Minimal CPU</title><link>https://xichen1997.github.io/posts/2025-09-19-add-array-support-for-dsl-based-on-minimal-cpu/</link><pubDate>Fri, 19 Sep 2025 00:06:00 +0000</pubDate><guid>https://xichen1997.github.io/posts/2025-09-19-add-array-support-for-dsl-based-on-minimal-cpu/</guid><description>&lt;p&gt;The project is hosted in the repository (section-4):
&lt;a href="https://github.com/xichen1997/minimal_turing_complete_CPU"&gt;minimal_CPU&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Currently, the compiler and DSL support the basic calculation like assigning value, reset value, add and substrction operation. However, there is an important part in the language not being supported. It&amp;rsquo;s array. The purpose of this array is to read and write string with more convenience, and it can be used to construct a mini-terminal or shell to interact with the simulated hardware(CPU).&lt;/p&gt;</description></item><item><title>DSL and Compiler Based on Minimal CPU</title><link>https://xichen1997.github.io/posts/2025-07-14-dsl-and-compiler-based-on-minimal-cpu/</link><pubDate>Mon, 14 Jul 2025 00:06:00 +0000</pubDate><guid>https://xichen1997.github.io/posts/2025-07-14-dsl-and-compiler-based-on-minimal-cpu/</guid><description>&lt;p&gt;The project is hosted in the repository (section-2):
&lt;a href="https://github.com/xichen1997/minimal_turing_complete_CPU"&gt;minimal_CPU&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;After implementing our minimal CPU, we can write machine code or assembly code to run programs. However, these low-level languages are not easy to read or maintain. We need a way to construct a high-level language that can be translated to machine code, making program development more accessible.&lt;/p&gt;
&lt;p&gt;Thus, the purpose of this section is to design a Domain-Specific Language (DSL) and its compiler with a complete build system.&lt;/p&gt;</description></item><item><title>Learn from MVP: Minimal Instruction Set CPU</title><link>https://xichen1997.github.io/posts/2025-05-30-learn-from-mvp-minimal-instruction-set-cpu/</link><pubDate>Fri, 30 May 2025 00:05:00 +0000</pubDate><guid>https://xichen1997.github.io/posts/2025-05-30-learn-from-mvp-minimal-instruction-set-cpu/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The 6502 CPU program has been a great inspiration for understanding the foundations of computer science. It&amp;rsquo;s fascinating how basic boolean functions and transistors can form such a complex and beautiful system. However, even the 6502 CPU, with its 150+ instructions, can be overwhelming for those trying to understand the fundamental principles of computing.&lt;/p&gt;
&lt;h2 id="the-importance-of-minimal-viable-products"&gt;The Importance of Minimal Viable Products&lt;/h2&gt;
&lt;p&gt;When learning complex systems, it&amp;rsquo;s crucial to start with a minimal viable product (MVP) - understanding the most essential components that make a program run. This approach led me to explore foundational theories and historical concepts in computing.&lt;/p&gt;</description></item><item><title>Running LLM on mac mini clusters, strategy and practice</title><link>https://xichen1997.github.io/posts/2025-02-27-running-llm-on-mac-mini-clusters-strategy-and-practice/</link><pubDate>Fri, 28 Feb 2025 00:05:00 +0000</pubDate><guid>https://xichen1997.github.io/posts/2025-02-27-running-llm-on-mac-mini-clusters-strategy-and-practice/</guid><description>&lt;h1 id="running-llm-on-mac-mini-clusters-strategy-and-practice"&gt;Running LLM on mac mini clusters, strategy and practice&lt;/h1&gt;
&lt;h2 id="data-parallel"&gt;Data Parallel&lt;/h2&gt;
&lt;p&gt;This is the most straightforward strategy, typically used when batch size &amp;gt; 1. It increases throughput by giving the system more data to process simultaneously.&lt;/p&gt;
&lt;h2 id="pipeline-parallel"&gt;Pipeline Parallel&lt;/h2&gt;
&lt;p&gt;Due to VRAM or unified memory limitations in Mac minis, loading the entire model into memory isn&amp;rsquo;t always possible. Pipeline parallel is an effective strategy to reduce memory usage.&lt;/p&gt;
&lt;p&gt;The approach splits the model into several parts, loading them into memory sequentially. When running, it operates like a factory pipeline - data flows through the system as different model parts process it in sequence.&lt;/p&gt;</description></item><item><title>Set up remote access for my backup desktop to serve as remote development machine</title><link>https://xichen1997.github.io/posts/2024-09-07-set-up-remote-access-for-my-backup-desktop-to-serve-as-remote-develoement-machine/</link><pubDate>Sat, 07 Sep 2024 00:05:00 +0000</pubDate><guid>https://xichen1997.github.io/posts/2024-09-07-set-up-remote-access-for-my-backup-desktop-to-serve-as-remote-develoement-machine/</guid><description>&lt;h1 id="background"&gt;Background&lt;/h1&gt;
&lt;p&gt;After configuring my new desktop, the previous one was suspended because of the electricity concern. But my friend told me his apartment was electric-free, after discussing with him I decided to move my previous desktop to his living room to set up a remote server to reduce the expence of cloud servers.&lt;/p&gt;
&lt;h1 id="steps"&gt;Steps&lt;/h1&gt;
&lt;h2 id="step-1-clean-the-computer"&gt;Step 1: clean the computer&lt;/h2&gt;
&lt;p&gt;This step was quiet simple, just download ubuntu 24.02 LTS image and install in the NVME SSD.&lt;/p&gt;</description></item></channel></rss>