Andrei Gudkov
About me

What it takes to transpose a matrix, 2024

In this introductory article we will take a look at the challenges encountered during development of efficient matrix algorithms for CPU. Our goal would be to create such an algorithm for matrix transpose problem targeting x86_64. We will focus primarily on dealing with high memory latency, and later — on code vectorization. During the course of the article developers may learn about CPU architecture issues and common techniques to handle them.

We are hiring! Algorithm research engineer, Huawei Cloud, 2020

Explaining machine learning pitfalls to managers, 2019

Linux tools

Setting up LXC on Debian desktop, 2019

Linux power tools: get and search source code in two minutes, 2019

String algorithms

Efficient implementation of MinHash, part 1, 2019

Rationale behind Boyer-Moore algorithm, 2018

Common pitfalls of using TCP, 2019

Connect timeouts, buffer sizes, ephemeral ports.

Guide to making high-quality thumbnails, 2019

Multistep scaling, unsharp masking, HiDPI.

Dangers of linking inline functions, 2018

The worst thing that can happen: wrong function body is called.

Advanced Hadoop topics

Let's solve some real-world problems and see why by-the-book solutions do not work and how to fix this.

Practical Hadoop, episode 1: JOIN, 2017

Practical Hadoop, episode 2: SELECT, 2017

Practical Hadoop, episode 3: Towards Stability, 2018

Practical Hadoop: FAQ, 2019

Project organization

Why skill match is not enough, 2017

Server-side programmers should support their software themselves, 2017

Scrum dysfunction, 2017

Storage subsystem performance: analysis and recipes, 2016

Long-read about storage subsystem, with images!

Article provides overall coverage of storage subsystem with main focus on performance. It is split into theoretical and practical parts. Theoretical part is dedicated to the components of IO stack with particular attention to modern data storage devices: HDD and SSD. Theory of operation provides the basis for explaining performance advantages and limitations of corresponding device; real-world test results are included as well. Practical part lists various methods of performance improvement and also gives hands-on advices about everyday tasks. Reader is expected to have previous experience of programming and system administration in Linux environment.

Supplementary benchmarking tool: drvperf