Hello, now that I am back into the swing of learning and writing again, I thought I would share some recent research with you. My thoughts are that by sharing my work it will both help me keep structure in my personal research; as well as maybe helping my readers in theirs as well.
In this post I will try and explain what I learned about Linux Processes, their definition, creation and how to find more information about them. This will be a more theoretical post then some of what you might find the on this blog, but I will try to offer some application of the concepts as well.
Definition:
What exactly is a processes and how are they different from a program and how do threads “tie” into it. To start I’m going to offer a proper definition from one of the best books I’ve ever found when it comes to the inner workings of Linux.
“The Linux Programming Interface” by Michael Kerrisk:
https://nostarch.com/tlpi
In this book, Kerrisk defines a process as “an abstract entity, defined by the kernel, to which system resources are allocated in order to execute a program.” page 114.
This is something I think is good to keep in mind, especially when it comes to understanding the difference between a thread and a process. Specifically a process is a collection of resources; think like, open files, memory blocks etc. Threads are the flow of execution that take actions on those resoureces within the process.
If we think about a program as a set of instructions for a computer to follow then the actual act of following those instructions would at a simple level conceptually be a process. Now first, a computer must take a few actions before beginning the work of walking through a program and producing its desired effects. Specifically the action of loading this program into memory in such a way that the machine is able to interpret it as instructions.
Note: I will not be getting into the Linux boot process as I believe that would make a good post onto itself, and it may also overly complicate this post as well.
Entering a command:
What happends when a command to start an application is entered into a terminal, one of two things may occur. The first that the shell itself may have the functionality built into its own code which handles the task itself. This is known as a builtin function and from what I understand its implementation is designed to cut back on the overhead of frequently used functionality.
The next method in which a program becomes a processes is through a set of system calls (functions provided by the kernel to request actions and resources in userspace.) The two system calls (or syscalls) are known as fork() and exec(). There are actually a few different variations on exec, but know that they carry out the same general task.
Creating an instance:
To carry out this task, first the calling program will make a duplicate of itself in memory. This newly allocated block of memory is identical to the calling process, although its reference id or what is known as its process ID or PID is different; it is a different running process at this point. To prove this point I found a couple really interesting little practice programs from: https://ece.uwaterloo.ca/~dwharder/icsrts/Tutorials/fork_exec/
You can play with them yourself, but essentially the idea is to fork the current process and then check the PIDs that are created. The neat thing about it is where execution continues after the fork procedure. Playing with these programs you can see that execution in the child process continues from the next line of code after the call to fork().
Here is a small example: #include <sys/types.h> #include <unistd.h> pid_t pid = fork();
What I find interesting is the use of the PID variable to determine execution flow based on wether or not it is in the child or parent process; 0 is child, and non-zero is parent. This can be applied as such:
if (pid == 0) execvp(“<command>”<args>);
This also could be handled via. switch-&-case where default is parent execution, 0 is child execution flow and -1 is error handling. All of this is described in much better detail in chapters 24-28 of “The Linux Programming Interface” by Michael Kerrisk; again, I really recommend it to anyone who is curious.
New process data:
From this point we know how to create an instance for our new process to exist in. We now need a way to change the newly created process into something more useful than a copy of its parent process. As mentioned earlier, this is done through a family of functions known as exec.
One of the first things you may notice in regards to exec is that there are different versions (execve, execl, execvp, etc.) This is known as a wrapper and is common in Linux, it acts similar to function overloading allowing multiple variations on the same general functionality. You consider writing your own wrappers if you find yourself frequently using a library function in a specific way, I’ll try to get into the details of that in a different post.
Now back to exec. This function replaces the memory map of the current child process (remember that a fork call duplicates the parent process in memory) with the new process. The usage is simpler then it seems, but for myself it took some play time to wrap my head around.
Here is an example:
char * args[] = {“sleep”, “20”, NULL}; execvp(“sleep”, args);
Walking through this code we first declare an array of pointers, each of which is initialized in order of the bracketed code; specifically, “Sleep”, “20”, and a NULL pointer. The first is the name of the program, if this confuses you then think about the line:
int main( int argc, char* argv[])
Focusing on argv, the first element in the array is the name of the program, this array is the new processes argument list. 20 is the first “working” argument, meaning the new program code will take an action based on this parameter (and any other if there are more.) Finally the NULL pointer acts as a stop sign so as not to read beyond allocated memory; just like when working with C style strings.
The last thing you should know before I give you a block of code to play with are the functions known as exit() and wait(); You may have used exit() in your code already but it has a neat property. The parent can wait for the child process to terminate through the wait call, which provides some basic synchronization. Through the exit() function the child can report an exit status to the parent, which is accessed through the wait() function call. You will see this in action in the example program below
Example code:
Here is some basic code you can play with:
#include <stdio.h> #include <stdlib.h> #include <sys/wait.h> #include <unistd.h> int main(void) { int pid = fork(); if ( pid == 0 ) { printf(“Child Process: %d - %d\n”, pid, getpid()); printf("Starting\n"); char* argv[] = {"sleep", "20", NULL}; execvp("sleep", argv); printf("Completed\n"); exit(0); } else { printf(“Parent Process: %d - %d\n”, pid, getpid()); int *status = malloc(sizeof(int)); *status = 1; int c_pid = wait(status); printf(“Parent Process: child process %d returned: %d\n”, c_pid, status); } }
Take your time and play with it, try modifying the different arguments and predicting the output. One thing that I found helpful during this play session, is to make the program sleep() at interesting points. Just long enough to copy elements from /proc/<pid>/ and compare the differences before and after the exec call. Specifically one “file” of interest to you may be /proc/<pid>/statm, which according to: https://man7.org/linux/man-pages/man5/proc.5.html, “Provides information about memory usage, measured in pages”. I believe this provides a simple way to verify that the process in memory has changed without getting into the complexities of extracting and examining the actual bytes.
One interesting thing that I did notice when it comes to checking /proc/<pid>/statm is that the output of the exec call to sleep() matches exactly the output of /proc/pid/statm when running sleep in bash. Obviously sleep has a simple function (not implementation,) but I am now curious if similar memory usage patterns could be used to determine an obfuscated process? That of course is beyond what I intended to get into in this post; but something I noticed during writing this and is a curiosity.
For now though, I would like to say thank you for reading. I hope this post was able to provide some insight; or at least give some direction for further learning. In regards to the content for my next post, I am still unsure, but don’t doubt I will stumble upon something interesting to share with you soon.
Cheers,
ElliePenguins