FOR-450 Adv. Malware Analysis

2025 Fall

2025 Aug 27

Notes

Reverse Engineering in Software

High level (similar to English, readable) vs low level language (computer understands, human generally doesn't)
- Python (high)
- ASM (low)
Ex: C, C++, Java, Python (relatively high) --> compile --> binary (low level)
Note: not all use compilation, some use interpreters

Example of c code:

int main() {
    int variable = 1;
    return 0;
}

Program binaries are just a bunch of gibberish and not readable. Ex:

high level languages to machine code
- Translation process:
  - Input source code files (c, c++)
  - Use compiler (gcc, clang)
  - Output machine code file executable by cpu
Modern compiler optimizations
- Minimize code size, improve execution performance, higher efficiency
- Impact of reverse engineering:
  - Straightforward instructions --> mathematically equivalent but obscure instructions
  - Optimized code becomes difficult to read
  - Original program logic gets buried under optimizations

Inputs binary file
Output asm code
Consider architecture
- some support multiple
entire program or specific parts
types of disassemblers
- standalone disassemblers - dedicated tools with features
- built-in disassemblers - embedded in debuggers

Binary back to high level code (ex: binary --> c)
- ex:
- Input binary
- output example.c
Limitations
- original source code recovery usually not possible
Decompiler vs Disassembler
- Dissassembler --> asm
- Decompiler --> high level code (c, cpp, etc)

CPU has a few major parts
- ALU
- CU (control unit)
- many registers
  - registers are much faster than ram
Advantages
- fastest data access method
- central to asm operations
Limitations
- very few available
- short-term only
- manual management
Programmers must manually
- load data from ram --> registers
- store data from registers --> ram
- manage register space

Binary, decimal, hex, etc
In C, int, short, long, float, etc
Negative numbers
- idk man ill teach myself
- nvm stackoverflow taught me goated website

I made a gif explaining it found here.

Intel uses CICC - Complex Instruction Set computer
- Many special purpose intstructions (likely wont ever see)
- Variable-length instructions, 1-16 bytes long
Other major architectures
- RISC - Reduced instruction set compiler
- typically more registers, less and fixed-sized instructions

8 general purpose registers + instruction pointer (points at next instruction to execute)
x86-32 registers are 32 bits long
x86-64 registers are 64 bits long

registers
- EAX - stores function return values
- EBX - base pointer to data section
- ECX - counter for string and loop operations
- EDX - i/o pointer
- ESI - Source pointer for string operations
- EDI - Destination pointer for string operations
- ESP - Stack pointer
- EBP - Stack frame base pointer
- EIP - Pointer to next instruction to execute (“instruction pointer”)
caller-save registers (eax, edx, ecx)
- if something in registers needs to be stored, the caller is in charge of saving the value before calling a subroutine and restoring the values after the call returns
- caller-save registers are likely to be modified
callee-save registers (ebp, ebx, esi, edi)
- if callee needs more registers than are saved by the caller, the callee must save them
EFLAGS
- register holds many single bit flags
- zero flag (zf) - set if the result of some instruction is zero
- sign flag (sf) - set equal to most-significant bit of the result, which is the sign bit of a signed integer (0 = positive, 1 = negative)
instructions
- NOP - do nothing
  - used to pad/align bytes or delay time
  - can be used to make exploits more reliable

the stack is a conceptual area of main memory (ram) which is designated by the os for programs
last-in-first-out (LIFO/FILO) data structure
by convention the stack grows toward lower memory addresses
when adding to the stack the "top" of the stack is a lower memory addresses
ESP points to the top of the stack (the lowest address in use)
the stack keeps track of which fucntions were called before the current one. it holds local variables and is used to pass arguments to the next function

note:

Notes

Words123