Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Articles

Articles

A set of random thoughts or ideas i've gathered over the years.

Feel free to reach out letting me know how crazy or stupid they are.

Last change: , commit: f57c640

On Hardware Patches.


                                                                            08/03/2026

Earlier this week, someone asked me about how they fix hardware vulnerabilities like spectre (the first and second variant) and meltdown.

And while i do not recall my response, i am under the impression that my answer was close to this; “Well, that can’t be fixed its hardware.”

My then prevailing assumptions were:-

  • Hardware once described in RTL is final; that is signals move from point A to B and are not bidirectional, hence to fix a vulnerability, you must change the RTL.1
  • Assembly language binary encodings are fixed and final; therefore they are the only way to effect a function whose output affects the observable execution environment.

The truth, it’s slightly more involving than that black and white declaration: “…it can’t be fixed in software.”

  • By this i mean any mitigations will have to be released in hardware updates.

In the event of a hardware vulnerablity, where the loophole exists in the behavior of actual functional components made up of circuits, it is true that the “real” solution will have to be a physical implementation.

Alternatively, certain mitigations can be offeered logically. This is where compilers and operating systems come in. The OS’ mitigation is to be found in policing the set of programs it is asked to execute i.e the memory locations it is accessing, the signals it is sending among a whole slew of other techniques i am stil unaware of.

As for the compiler, it polices the types of operations it is allowed to perform on the program’s data during compilation. So the solutions come in three forms, all being updates/patches. In software, hardware and microcode.

Yes, there is such a thing as microcode.

Last change: , commit: f57c640

  1. RTL - Register Transfer Language

Coder's fright

Coder’s fright

[DISCLAIMER] This is a pre-release version of an unsufficiently tested method that i am espousing. I will expound on it when its done. If you have variations of whatever i am trying to put across, feel free to launch a PR.

Battling your inner demons

You know, in all the programming lore that i have come across, none of them talk about how difficult it can be to approach a new codebase.

Ok, it is likely that my observation is biased by the fact that i am a beginner at this and the individuals that get to talk about stuff are experts. Probably there is a shift with increase in experience.

Since the space that mathematics can describe is essentially infinite, i think the ability to be frightened by a codebase can be quantified by a function whose parameters are the relative size of the codebase and the average number of new concepts i will encounter per module.

Of course there are other things like the language in which the codebase is written that will factor in like one’s mastery of the language and the degree of specificity1 that the lanuage allows.

So the size of the codebase will obviously determine the duration you have to spend nested in the codebase to have a firm grasp of it. New concepts establish the neccessity to be explored especially if they are heavily referenced or used in the implementation. These create detours that can often be deemed unneccessary since they don’t help in meeting the deadline.

I’ll probably come back to the utility of such detours but as for now let us explore our newfound fear.

So perhaps you are trying to be a systems engineer and there you are staring at a backend whose default mode of operation is with async executions and has also decided to use a threadpool in their implementation.

If you are sufficiently green, you have a vague idea of what asynchronous execution means, and if you are lucky enough you have been exposed to it through languages like Go. You are unaware of synchronization primitives and your working assumption is all computation that has been parrallelized is in optimal working conditions. Threadpools to you are similar to unicorns.

Strategizing

At this point the codebase is like a predator and you are its prey. Its primary intent is to crush your spirit and remind you of your inefficiencies. It shouts of your incompetence.

You start looking into asynchrony. When you scratch its surface you come across the notion of a runtime, an executor and you learn of a way to mimic the operating system’s scheduler.

Now to get to understanding the scheduler you have to go a layer deeper. It is there that you get a proper introduction to computer organization and its architecture. If you pay heed to the call of adventure you get thrown into a world filled with ISAs. The scales fall off your eyes and now what emerges is the realisation that “the world is way larger than the computer you are hunched over”.

If you are a grown up, you cease the pursuit and retrace your steps. You have work to do.

Personally that is where my kind of trouble begins because i have always found that going a layer deeper always yielded a better picture.

Now, if you are going to understand a codebase there is getting familiar with the “literature” and then there is getting to understand the implications of the “literature” as applied or implemented.

Grasping the implications requires a whole roadtrip. You play out the serial execution as it occurs in the codebase and verify the correctness or quality of emergent logic from the order of executions.

Getting to that point alone can be immensely satisfying but now there is a kick to this kick. Now you want to modify the logic to enhance either perfomance or enhance usability by probably adding an extension.

If you have ever traced a circuit using probes maybe you’ll relate to this. Adding an extension to the codebase is similar to knowing the exact places to tap in a closed circuit. It is like knowing exactly which points can i attach to and i get the exact expected output.

The real kick is probably in enhancing performance. To enhance perfomance you have to breakdown the implementation detail.

From an operation that was simply responding to requests to access files to picturing threads (async or otherwise) that are listening to requests make a malloc call to allocate a buffer, read a file into that buffer before transmitting it through the sockets. Now you have to figure out where is it that you wasting time in and why. Once you have found out you make a point of probably not touching the memory as often if memory was the bottleneck or you decide to switch the allocator all together. If that’s not enough and you have sufficient memory, you probably make the opening of certain frequently accessed texts to be opened early enough to prevent several consecutive syscalls.

There is probably a trick about maintaining socket liveness to escape the handshake thing but anyway. Those who are serious enough switch to io-uring.

I believe that it is in the pursuit of this kick that optimization engineers live.

Last change: , commit: f57c640

  1. Certain languages particularly low level ones allow you to have more control of how your data will be computed. This is achieved mostly using compiler intrinsics or lints like those that enforce a particular data alignment.

Shorts


A select section of random thoughts that do not exactly fit as articles.

On the guts of computing

In as much as i have gathered so far i think this is it. The core attribute of computers is this; it is made up of transistors and those transistors are combined in a way that can be used to capture an essential function that enables the conveying of some logic. And these logii are subject to be combined in several ways that enable the conveying of an more abstract capability which permits us to describe concepts that have much more scope in breadth and depth effectively. We essentially managed to break down complexity to bits.

This statement is more likely to be observable in ideological constructs. Several concepts within a particular domain tend to stem from a central core.

Let this collapse into the domain religion. If as humanity what we actually describe as a being we are incapable of fathoming as God, are we in essence creating the tool that will enable us have a picture of what that idea means?

The power of perspective

[22/03/26]

Oftenly, i imagine what having access to certain kinds of information affords you. If you are in the technology space, the information you have access to determines the strides you are willing to go toeven make an idea work and also whether or not you are willing to go through with a particular plan of action. It is in these instances that Musk comes to mind. Whatever he has done is basically unheard of. Not to rever the guy but its just facts. He leads in space exploration, electric vehicle design, rehabilitation of quadriplegics, internet access services. We haven’t considered what they are doing with Boring and SolarCity,let’s not even talk about X. It is likely there is a company i am forgetting.

He has access to the best in those fields and he talks to them on a regular basis. I mean you can’t just dismiss the guy. Any one who is an expert in his/her field is acutely aware of how narrow her point of view is. With the guy i can’t help but feel he is always trying to fill the gap of unknown unknowns.

This is a perspective that happens to be capture both humility and hubris in the same tone. Because this presumes the ability to comprehend all of what is known but also the awareness that regardless of how much one has been made aware of there is still more that is left unexplored.

These are the modern day renaissance men.

So if we shift that focus to the field of computing we find equivalents of such characters in Joran Dirk Greef TigerBeetle’s founder and my personal favorites Chris Lattner, Jim Keller and Jordan Peterson. The other is Iain McGilchrist.

Last change: , commit: f57c640

Appendices

Some things you may find useful.

Last change: , commit: f57c640

Digital Logic

Constructs used in digital circuit optimization

  • Commutative
  • Associative
  • Distributive
  • Absorption
  • Combining
  • De Morgan’s Laws
  • Consensus
Last change: , commit: f57c640

Core

  • Control Flow Graphs
  • Static Single Assignment (SSA)
  • Data Flow Analysis
  • Dominator Trees - Block A dominates B if every path to B goes through A
  • Loop Analysis
  • Constant Propagation
  • Dead Code Elimination
  • Common Subexpression Elimination
  • Code Motion
  • Strength reuction
  • Inlining
  • Instruction scheduling
Last change: , commit: f57c640

LLVM-IR

An overview of some of llvm’s internals illustrating their usages and syntax.

Categories

  • Attributes
  • linkages
  • Terminator instructions
  • Metadata
  • Debug Information

For a complete reference of the IR, lookup LLVM_IR

If you have been keen on the developmenents in the compiler tech world, you’ll know that compiler infrastructure fundamentally boils down to MLIR. And no wonder, it eliminates the need to design various code generators that necessarily need to be built for any language relying on its own custom IR.

I do not think it can also be overstated how much utility it avails all in a single codebase, especially the optimization and verification passes.

So this effort goes to explore how various high level language constructs get to be lowered to
the IR.

Examples

SSA features

This instance show cases the constrains laid out by SSA

fn greater_than (a :u32, b: u32) bool {
    if  a > b {
        true
    }else{
        false
    }
}


fn main() {
    let mut a = 6;
    let mut b = 8;
    if greater_than(a, b) {
        a += 1
    }else{
        b += 1
    }
}

This is a purely subjective view. Static Single Assignment (SSA) is currently an approach to help track definitions of variables and separate the need to track various reassignments to the same variables.

It’s hard to exactly see the problem that this solves and why it lies at the core of the IR but a little examination quickly illustrates why it is useful.

Compilers generally have to keep track of every variable used in an expression and their respective definitions. This is important because it needs to get a proper picture of what exactly the program is doing in order to perform modifications without altering the observed behavior of the program or what is visible in the execution environment.

Basically all modifications no matter how radical ought to maintain the semantics of the program.

I find the fact that we managed to do this a very impressive feat.

So in order to generate efficient assembly or basically assembly that will simply not be highly improved when hand written, multiple analyses have to be performed on the program at several levels. So the constrains put on the compiler writer are partly something they sign up to when designing the language features. So if you are writing in a dynamic language where you can reassign variables however, the challenge is how to keep track of the changes in the variables in a way that maintains consistency as the variables are accessed and in use. This is effectively similar to repeatedly aiming and necessarily hitting a moving target.

Now compiled languages or by and large systems languages and their compiler authors have an added advantage in the following sense; if you are writing in a low level language, the compiler writer has the oppportunity to expose some of the target architecture’s complexity to the interface so that the programmer can get to handle whatever warts that come with it in a way. This also means that a programmer is given more control in such settings. But i deviate.

The ssa imposes a constrain in that one can only assign once to a variable when defined. This touches upon the issue of use-def chains.

But basically this eases the dataflow analyses that have to be performed on the program. By the way this constrain applies regardless of the scope in which the parts of prograom are being analyses.

   define i1 @greater_than(i32 %a, i32 %b) {
     start:
       %cmp = icmp ugt i32 %a, i32 %b
       br i1 %cmp, label %if_true, l abel %if_false
     
     if_true:
       ret i1 true
       
     if_false:
       ret i1 false      
   }
   
   define void @main() {
     start:
       %0 = alloca i32
       %1 = alloca i32
       store i32 6, ptr %0
       store i32 8, ptr %1
       %cmp = call i1 @greater_than(i32 %0, i32 %1)
       br i1 %cmp, label %if_greater, label %if_less
       
     if_greater:
       %_0 = load i32, ptr %0
       %_01 = add i32 %0, 1
       ret void
       
     if_less:
       %_1 = load i32, ptr %1
       %_11 = add i32 &_1, 1
       ret void        
   }

Actual rustc output.

   ; great::main
   ; Function Attrs: nonlazybind uwtable
   define hidden void @_RNvCs3TUjdV8qTsL_5great4main() unnamed_addr #1 {
   start:
     %_13 = alloca [16 x i8], align 8
     %args = alloca [16 x i8], align 8
     %b = alloca [4 x i8], align 4
     %a = alloca [4 x i8], align 4
     store i32 6, ptr %a, align 4
     store i32 8, ptr %b, align 4
     %_4 = load i32, ptr %a, align 4
     %_5 = load i32, ptr %b, align 4
   ; call great::greater_than
     %_3 = call zeroext i1 @_RNvCs3TUjdV8qTsL_5great12greater_than(i32 %_4, i32 %_5)
     br i1 %_3, label %bb2, label %bb4
   
   bb4:                                              ; preds = %start
     %0 = load i32, ptr %b, align 4
     %_7.0 = add i32 %0, 1
     %_7.1 = icmp ult i32 %_7.0, %0
     br i1 %_7.1, label %panic, label %bb5
   
   bb2:                                              ; preds = %start
     %1 = load i32, ptr %a, align 4
     %_6.0 = add i32 %1, 1
     %_6.1 = icmp ult i32 %_6.0, %1
     br i1 %_6.1, label %panic1, label %bb3
   
   bb5:                                              ; preds = %bb4
     store i32 %_7.0, ptr %b, align 4
     br label %bb6
   
   panic:                                            ; preds = %bb4
   ; call core::panicking::panic_const::panic_const_add_overflow
     call void @_RNvNtNtCsgc7BJoiPOQP_4core9panicking11panic_const24panic_const_add_overflow(ptr align 8 @alloc_9d4dbda1dd74df7697d1e3a0acc956d8) #9
     unreachable
   
   bb6:                                              ; preds = %bb3, %bb5
   ; call <core::fmt::rt::Argument>::new_display::<u32>
     call void @_RINvMNtNtCsgc7BJoiPOQP_4core3fmt2rtNtB3_8Argument11new_displaymECs3TUjdV8qTsL_5great(ptr sret([16 x i8]) align 8 %_13, ptr align 4 %b) #7
     %2 = getelementptr inbounds nuw %"core::fmt::rt::Argument<'_>", ptr %args, i64 0
     call void @llvm.memcpy.p0.p0.i64(ptr align 8 %2, ptr align 8 %_13, i64 16, i1 false)
   ; call <core::fmt::Arguments>::new::<4, 1>
     %3 = call { ptr, ptr } @_RINvMs2_NtCsgc7BJoiPOQP_4core3fmtNtB6_9Arguments3newKj4_Kj1_ECs3TUjdV8qTsL_5great(ptr align 1 @alloc_61247b90e1706a3f65e71312b599d3d1, ptr align 8 %args) #7
     %_9.0 = extractvalue { ptr, ptr } %3, 0
     %_9.1 = extractvalue { ptr, ptr } %3, 1
   ; call std::io::stdio::_print
     call void @_RNvNtNtCskKV3BO88lSU_3std2io5stdio6__print(ptr %_9.0, ptr %_9.1)
     ret void
   
   bb3:                                              ; preds = %bb2
     store i32 %_6.0, ptr %a, align 4
     br label %bb6
   
   panic1:                                           ; preds = %bb2
   ; call core::panicking::panic_const::panic_const_add_overflow
     call void @_RNvNtNtCsgc7BJoiPOQP_4core9panicking11panic_const24panic_const_add_overflow(ptr align 8 @alloc_7b5440927130137bf397d791bde43b7e) #9
     unreachable
   }
   

If you are keen you’ll notice no defined variable is re-assigned twice. In SSA there is no concept of shadowing. Each defined variable has to be the only instance of its initialization; reusing it requires the instantiation of a new variable.

Terminator instructions

The terminator instructions are: ‘ret’, ‘br’, ‘switch’, ‘indirectbr’, ‘invoke’, ‘callbr’ ‘resume’, ‘catchswitch’, ‘catchret’, ‘cleanupret’, and ‘unreachable’.

ret

return a value from a function.

_syntax
```text
  ret <type> <value>
```

_use-cases
```llvm-ir
  ret void
  ret i64 %_0
  ret i32 0
  ret { i32, i8 } { i32 4, i8 2 } ; a struct with fields of i32 and i8
```
The return type must be of a first class type so basically it must be of the primitive types `i8`,`i32`,`i32`,`float` etc. If it is an aggregate type like a struct of some sort it must be delineated to illustrate the individual types within the aggregate type.

br

branch conditionally or unconditionally from a block.

_syntax

   ;conditional branch
   br i1 <cond>, label <iftrue>, label <iffalse>
   
   ; unconditional branch
   br label %block

_use-cases

   br label %bb14
   br i1 %_2, label %bb2, label %bb3

switch

_syntax

   switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ]

_use-cases

   switch i64 %_2, label %bb1 [
       i64 0, label %bb5
       i64 1, label %bb4
       i64 2, label %bb3
       i64 3, label %bb2
   ]

From the langref :- “Depending on properties of the target machine and the particular switch instruction, this instruction may be code generated in different ways. For example, it could be generated as a series of chained conditional branches or with a lookup table.”

Linkages

These appear to be information being passed to the linker.

Last change: , commit: f57c640