Category Archives: Learning

Linux Process

Hello, now that I am back into the swing of learning and writing again, I thought I would share some recent research with you. My thoughts are that by sharing my work it will both help me keep structure in my personal research; as well as maybe helping my readers in theirs as well.

In this post I will try and explain what I learned about Linux Processes, their definition, creation and how to find more information about them. This will be a more theoretical post then some of what you might find the on this blog, but I will try to offer some application of the concepts as well.

Definition:

What exactly is a processes and how are they different from a program and how do threads “tie” into it. To start I’m going to offer a proper definition from one of the best books I’ve ever found when it comes to the inner workings of Linux.

“The Linux Programming Interface” by Michael Kerrisk:
https://nostarch.com/tlpi

In this book, Kerrisk defines a process as “an abstract entity, defined by the kernel, to which system resources are allocated in order to execute a program.” page 114.

This is something I think is good to keep in mind, especially when it comes to understanding the difference between a thread and a process. Specifically a process is a collection of resources; think like, open files, memory blocks etc. Threads are the flow of execution that take actions on those resoureces within the process.

If we think about a program as a set of instructions for a computer to follow then the actual act of following those instructions would at a simple level conceptually be a process. Now first, a computer must take a few actions before beginning the work of walking through a program and producing its desired effects. Specifically the action of loading this program into memory in such a way that the machine is able to interpret it as instructions.

Note: I will not be getting into the Linux boot process as I believe that would make a good post onto itself, and it may also overly complicate this post as well.

Entering a command:

What happends when a command to start an application is entered into a terminal, one of two things may occur. The first that the shell itself may have the functionality built into its own code which handles the task itself. This is known as a builtin function and from what I understand its implementation is designed to cut back on the overhead of frequently used functionality.

The next method in which a program becomes a processes is through a set of system calls (functions provided by the kernel to request actions and resources in userspace.) The two system calls (or syscalls) are known as fork() and exec(). There are actually a few different variations on exec, but know that they carry out the same general task.

Creating an instance:

To carry out this task, first the calling program will make a duplicate of itself in memory. This newly allocated block of memory is identical to the calling process, although its reference id or what is known as its process ID or PID is different; it is a different running process at this point. To prove this point I found a couple really interesting little practice programs from: https://ece.uwaterloo.ca/~dwharder/icsrts/Tutorials/fork_exec/

You can play with them yourself, but essentially the idea is to fork the current process and then check the PIDs that are created. The neat thing about it is where execution continues after the fork procedure. Playing with these programs you can see that execution in the child process continues from the next line of code after the call to fork().

Here is a small example:
#include <sys/types.h>
#include <unistd.h>

pid_t pid = fork();

What I find interesting is the use of the PID variable to determine execution flow based on wether or not it is in the child or parent process; 0 is child, and non-zero is parent. This can be applied as such:

if (pid == 0)
execvp(“<command>”<args>);

This also could be handled via. switch-&-case where default is parent execution, 0 is child execution flow and -1 is error handling. All of this is described in much better detail in chapters 24-28 of “The Linux Programming Interface” by Michael Kerrisk; again, I really recommend it to anyone who is curious.

New process data:

From this point we know how to create an instance for our new process to exist in. We now need a way to change the newly created process into something more useful than a copy of its parent process. As mentioned earlier, this is done through a family of functions known as exec.

One of the first things you may notice in regards to exec is that there are different versions (execve, execl, execvp, etc.) This is known as a wrapper and is common in Linux, it acts similar to function overloading allowing multiple variations on the same general functionality. You consider writing your own wrappers if you find yourself frequently using a library function in a specific way, I’ll try to get into the details of that in a different post.

Now back to exec. This function replaces the memory map of the current child process (remember that a fork call duplicates the parent process in memory) with the new process. The usage is simpler then it seems, but for myself it took some play time to wrap my head around.

Here is an example:

char * args[] = {“sleep”, “20”, NULL};
execvp(“sleep”, args);

Walking through this code we first declare an array of pointers, each of which is initialized in order of the bracketed code; specifically, “Sleep”, “20”, and a NULL pointer. The first is the name of the program, if this confuses you then think about the line:

int main( int argc, char* argv[])

Focusing on argv, the first element in the array is the name of the program, this array is the new processes argument list. 20 is the first “working” argument, meaning the new program code will take an action based on this parameter (and any other if there are more.) Finally the NULL pointer acts as a stop sign so as not to read beyond allocated memory; just like when working with C style strings.

The last thing you should know before I give you a block of code to play with are the functions known as exit() and wait(); You may have used exit() in your code already but it has a neat property. The parent can wait for the child process to terminate through the wait call, which provides some basic synchronization. Through the exit() function the child can report an exit status to the parent, which is accessed through the wait() function call. You will see this in action in the example program below

Example code:

Here is some basic code you can play with:

#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void) {
     int pid = fork();

     if ( pid == 0 ) {
          printf(“Child Process: %d - %d\n”, pid, getpid());
          printf("Starting\n");
          char* argv[] = {"sleep", "20", NULL};
          execvp("sleep", argv);
          printf("Completed\n");
          exit(0);
     } else {
          printf(“Parent Process: %d - %d\n”, pid, getpid());
          int *status = malloc(sizeof(int));
          *status = 1;
          int c_pid = wait(status);
          printf(“Parent Process: child process %d returned: %d\n”,
               c_pid, status);
     }
}

Take your time and play with it, try modifying the different arguments and predicting the output. One thing that I found helpful during this play session, is to make the program sleep() at interesting points. Just long enough to copy elements from /proc/<pid>/ and compare the differences before and after the exec call. Specifically one “file” of interest to you may be /proc/<pid>/statm, which according to: https://man7.org/linux/man-pages/man5/proc.5.html, “Provides information about memory usage, measured in pages”. I believe this provides a simple way to verify that the process in memory has changed without getting into the complexities of extracting and examining the actual bytes.

One interesting thing that I did notice when it comes to checking /proc/<pid>/statm is that the output of the exec call to sleep() matches exactly the output of /proc/pid/statm when running sleep in bash. Obviously sleep has a simple function (not implementation,) but I am now curious if similar memory usage patterns could be used to determine an obfuscated process? That of course is beyond what I intended to get into in this post; but something I noticed during writing this and is a curiosity.

For now though, I would like to say thank you for reading. I hope this post was able to provide some insight; or at least give some direction for further learning. In regards to the content for my next post, I am still unsure, but don’t doubt I will stumble upon something interesting to share with you soon.

Cheers,
ElliePenguins

RAID Playground

Hello friends, it has been some time since my last post; which I am now realizing was over a year ago. Maybe current world events have altered my ability to sense time? Anyways I hope you all have been well.

My thoughts are that this post will act as a bit of a bump start to get me blogging again. Warm the engines a little before diving into something a little more complex. To begin, I would like to start sharing with you some of the ways that I studied computer concepts before post-secondary education. For this first post I will detail how to create a lab environment for playing with RAID.

What is RAID:

RAID is an acronym for “Redundant Array Of Independent Disks,” although I have also heard people say “Redundant Array of Inexpensive Disks.” Whatever you choose to call it is up to you.

RAID allows disks to work together as a single storage pool that can both speed performance of the disks by spreading the I/O across multiple interfaces and add a bit of protection against data loss. This is done by having each disk store a little bit of the data on each platter, which is known as striping, it also has each of the disks store a little bit of the data from others in the array which allows for recovery of that data in the event of a disk failure within the pool. This is however a very high level explanation of the concept; Hopefully I will be able to dig deeper into the underlying mechanism in the future.

Note: There are multiple configurations such as just striping (RAID 0) and mirroring (Raid 1) and even newer concepts such as RAID 10, but we are only setting up a play environment in this post. You can play with different configs on your own.

Lets Start:

To begin, open a terminal and create a directory to work in. This is where we will be creating our “disks.” These “disks,” will actually be files, just big blocks of zeros on a current file system that will be used to simulate real disks, these are called loops.

Quick Warning: be careful when working with these tools as they can cause unintended complications if you accidentally enter the wrong arguments, this can include data loss and system failure. You should also be aware of your systems current storage configuration; specifically, does it already have any raid devices running on it?
Here is a link that can give you some more info on how to check:
https://www.cyberciti.biz/faq/how-to-check-raid-configuration-in-linux/

In your working directory, create the storage files with:

 dd if=/dev/zero of=disk1 bs=1024 count=10240

you should now have a file in the current working directory named disk1 that is 10MB in size.

Because for this example we are using raid level 5, do this 2 more times, naming them disk2 and disk3.

The next thing we will need to do is create a device node and attach our disk files to them.

To do this use the command:

 losetup -f

If there are no errors the next step is to attach the disk file, as such:

 losetup /dev/loop0 disk1

you will need to iterate over both of those commands 2 more times in order, the reason is that losetup -f provides the last unused device node; creating a new one with this method thus involves making sure the previously created node is attached to a disk before creating another.

Once this is complete, you can verify the config with:

 losetup -a

which should reply with something along the lines of:

/dev/loop1: []: (/home/user/raid/disk2)
/dev/loop2: []: (/home/user/raid/disk3)
/dev/loop0: []: (/home/user/raid/disk1)

If you are wondering the reason for doing this as opposed to just using a virtual machine with virtual disks, which is common when learning. This method allows you to create and delete “disks” without stepping out of the environment. Or in the case of a physical system, add disks or change partitioning schemes when you want to play with a different or complex config on the fly. You could also try applying this method in a script to create, test, and destroy completely in the current environment for learning, play or testing.

Now you should be able to see the current block devices available to the system with the command:

 lsblk

which should look something like:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 10M 0 loop 
loop1 7:1 0 10M 0 loop 
loop2 7:2 0 10M 0 loop 
sda 8:0 0 128G 0 disk 
├─sda1 8:2 0 80G 0 part /
├─sda2 8:3 0 8G 0 part [SWAP]
└─sda3 8:4 0 40G 0 part /home

If you see the loop devices that we created then it is time to create our array, yay!

To do this, use the command:

mdadm –create –verbose /dev/md0 –level=5 –raid-devices=3
/dev/loop0 /dev/loop1 /dev/loop2

Now check that it is created with:

 mdadm --detail /dev/md0 --scan

Or by checking in:

 /proc/mdstat

If all works well you can see the details of the array with the command:

 mdadm --detail /dev/md0

Which should reply with information similar too:

/dev/md0:
Version : 1.2
Raid Level : raid5
Array Size : 16384 (16.00 MiB 16.78 MB)
Used Dev Size : 8192 (8.00 MiB 8.39 MB)
Raid Devices : 3
Total Devices : 3

State : clean 
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Consistency Policy : resync

Number Major Minor RaidDevice State
0 7 0 0 active sync /dev/loop0
1 7 1 1 active sync /dev/loop1
3 7 2 2 active sync /dev/loop2

Running lsblk again, should return something similar too:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 10M 0 loop
└─md0 9:0 0 16M 0 raid5 
loop1 7:1 0 10M 0 loop 
└─md0 9:0 0 16M 0 raid5 
loop2 7:2 0 10M 0 loop 
└─md0 9:0 0 16M 0 raid5 
sda 8:0 0 128G 0 disk 
├─sda1 8:2 0 80G 0 part /
├─sda2 8:3 0 8G 0 part [SWAP]
└─sda3 8:4 0 40G 0 part /home

Your new totally software RAID environment should now be ready to play with, you can try creating a new file system on /dev/md0 with:

 mkfs.ext4 /dev/md0

and mount with:

 mount /dev/md0 /mnt

Or if you are curious, layer in LVM, which provides the ability to carve the large pool of disks into smaller partitions for specific usages within a larger system. Using RAID in combination with LVM is an incredibly powerful storage solution that is important to understand.

Some Fun ideas might also include, failing disks, creating and adding new disks, removing disks and resizing the file system or, if you chose to use LVM the volume groups. Other ideas you could try might be moving the array of disks to a new box or virtual machine by transfering to files and reassembling on the new machine. Practising concepts like this in advance can allow you to act fast when required.

Here are some commands to get you started:

simulate failed drive:

 mdadm –fail /dev/md0 /dev/loop0

add another disk, remember to create it with dd and losetup:

 mdadm –add /dev/md0 /dev/loop5

remove a disk, don’t forget to modify the file system:

 mdadm –remove /dev/md0 /dev/loop5

Hopefully this will provide you with a good test environment to practice managing storage.

Best of luck
ElliePenguins.