What it takes to transpose a matrix, 2024
In this introductory article we will take a look at the challenges encountered during development of efficient matrix algorithms for CPU. Our goal would be to create such an algorithm for matrix transpose problem targeting x86_64. We will focus primarily on dealing with high memory latency, and later — on code vectorization. During the course of the article developers may learn about CPU architecture issues and common techniques to handle them.
Linux tools
String algorithms
Common pitfalls of using TCP, 2019
Connect timeouts, buffer sizes, ephemeral ports.
Guide to making high-quality thumbnails, 2019
Multistep scaling, unsharp masking, HiDPI.
Dangers of linking inline functions, 2018
The worst thing that can happen: wrong function body is called.
Advanced Hadoop topics
Let's solve some real-world problems and see why by-the-book solutions do not work and how to fix this.
Project organization
Storage subsystem performance: analysis and recipes, 2016
Long-read about storage subsystem, with images!
Article provides overall coverage of storage subsystem with main focus on performance. It is split into theoretical and practical parts. Theoretical part is dedicated to the components of IO stack with particular attention to modern data storage devices: HDD and SSD. Theory of operation provides the basis for explaining performance advantages and limitations of corresponding device; real-world test results are included as well. Practical part lists various methods of performance improvement and also gives hands-on advices about everyday tasks. Reader is expected to have previous experience of programming and system administration in Linux environment.
Supplementary benchmarking tool: drvperf