For the kernel hacking class that I’m in it looks like I’m going to be adding support for turnstiles to the Linux kernel. Turnstiles can be found in Open Solaris and they are used in Semaphores instead of wait queues in Linux. Turnstiles are different from wait queues because they also take priority into account when they are selecting the process that will receive the lock. This helps solve the priority inversion problem. I’ll also need to come up with a way to prevent process starvation.
Solaris Locking Primitives
March 6, 2006Mutexes – There are two types of mutexes that are currently implemented in the kernel. They are spin and adaptive mutexes. Adaptive mutexes are designed to either spin or block when a lock is held; depending on what state the holder of the lock is in. Spin locks have to be used in high-level interrupt handlers, because blocking means you are forced to have a context switch.
Conditional Variables – Conditional variables are kernel thread level synchronization primitives. These are implemented as a structure, and they also use a mutex to provide the locking ability. The mutex is passed into the functions as necessary, it is not part of the conditional variable structure.
Semaphores – The semaphores that are included in Solaris are standard counting semaphores that we’re used to using. They should be used in situations that the speed of mutexes and rwlocks aren’t necessary. The fact that they are a counting semaphore makes it very natural for them to be used for resource allocation and deallocation. They also use standard sleep queues, unlike mutexes which use turnstiles when the processes block on a given resources.
Contracts – These are a much more complicated synchronization primitive. There main purpose to help enrich the relationship between a process and the system resources. These system resources can include pages, threads, and they can also be used for asynchronously reporting errors to a process. This primitive depends on the contracts file system to provide an interface to user level processes. Contract types are used to implement the different relationships between processes and system resources.
Turnstiles – Turnstiles are another synchronization primitive that provide blocking and wakeup support, and priority inheritance for synchronization primitives, which include mutexes and rwlocks. This is very similar to the linux kernel wait queue, but the difference is that this helps prevent priority inversion. It does this be supporting priority inheritance and not being a standard FIFO data structure like the linux kernel wait queue.
Dispatcher Locks – This is a type of kernel locking primitive that is specific to the Solaris scheduler. There are two types of these locks they are spin locks and locks that also have the ability to raise the priority level of a process. They are used almost exclusively in the scheduler, they see extremely limited use in other parts of the kernel. This is needed because you cannot handle interrupts when you are modifying the dispatch queues. This is very similar to Linux spin locks that also disable interrupts on the system. This is a one byte data structure that also has the ability to disable the irqs and is implemented in assembly language all platforms.
Using MDB, the Modular Debugger
February 27, 2006MDB or the modular debugger is used for extremely low level debugging. It falls into the category of debuggers that are used in combination with core files and the platform assembly language to diagnose and correct problems. Its main use is for analyzing the state of the program that you are running at the assembly language level.
MDB allows programmers to implement modules that will execute commands while you are debugging your programs. There are already modules for kernel, and device driver debugging that are included with Solaris. Using the kernel debugging modules you can do the following things:
• Locate a kernel thread’s memory
• Print a picture of a kernel stream
• Determine the type of structure that an address points to
• Detect memory leaks
• Locate stack traces
MDB currently supports debugging and examining the following targets.
• User level processes
• User level core files
• Viewing operating system execution using /dev/kmem and /dev/ksyms
• Controlling operating system execution using kmdb
• Kernel crash dumps
• Elf object files
• Raw data files
To start the kernel level debugger kmdb you should start mdb with the –k option.
Debugging Kernel Crash Dumps:
When the Solaris kernel crashes it will leave a core file in /var/crash on the system that has crash. The debugger needs to be invoked on a system that is the same architecture and solaris version as the system that crashed. If you do not do this then debugging will not work.
Command What it does
::showrev Prints the kernel version information
::status Prints the module and cause of the kernel panic
::panicinfo Prints the registers and there values at the time of the panic
::msgbuf Shows the content of the kernel message buffer
$C Prints a kernel level stack trace for the kernel thread that had the kernel panic
Functionname::dis Prints the disassembly of the function that you specify.
::cpuinfo Shows a large amount of information about the cpus in the system.
::threadlist Prints a listing of all the kernel threads that are available in the kernel
::findleaks Helps detect memory leaks. Needs a decent amount of kernel options turned on at build time.
::vatopfn Translates a virtual address to a physical address
::whatis Determines if the address is a pointer to a buffer or a different type of special memory region
Using the Kernel Modular Debugger:
KMDB is extremely powerfull. It not only allows you to view what is going on in the system at the instruction level, it also allows you to modify the values of those instructions and execute your own instructions. There is no safety net, you can do whatever you want on the system.
There are two ways to start kmdb on the system that you are trying to debug. The first one is to start mdb with the –k option from the system’s console login. The second one is to pass the –k option to the kernel boot arguments. On the x86 platform you can do this be modifying the grub configuration file. Also if you need to set any breakpoints for the boot process you can supply the –d flag and the kernel will enter the debugger so you can set kernel variables and breakpoints to the values that you want. If you are working on the SPARC platform you can pass the options at the boot prompt, but you must also specify the command kmdb.
The commands that are used for debugging kernel crash dumps can also be used when you are debugging the kernel.
Authors problem
February 23, 2006Nate,
Are you referring to the author part that was in the date tag on each post before?
Mike
Nexenta Review
February 22, 2006Install:
The installation of Nexenta isn’t exactly the easiest thing in the world, but it’s not the hardest. It’s on par with a more advanced Debian installation. My only real gripe with the installer was it was slow. It took 2 hours to install under VMWare with no other programs running. That’s just way to much, especially when I can install Fedora in about an hour and a half and install my complete development environment.
User Interface:
The user inteface is the quite similiar to Ubuntu. It is based on it after all. The only other thing that annoyed me was the fact that they still included all the KDE applications in the Gnome menus and they didn’t bother to put them in another submenu. It just seems to clutter everything up and really bother me.
Package Management:
Nexenta uses apt for its package management. This is much better and easier to use than the regular Solaris tool, pkg-add which doesn’t fetch any of the dependencies when you go to install a package. One thing that I found disappointing is that they don’t have a package of the Sun Studio Compilers. You’ve got to go and download it yourself, they have gcc as the default which doesn’t easily build open solaris kernels. You’ve got to have some extra tools and packages.
Overall Nexenta is a pretty decent Unix distribution. It is a huge step up from Open Solaris and the Java Desktop System. It just can’t really compare to most of the Linux distributions that are available, especially in the installation process.
Some solaris links
February 21, 2006Here’s some Open Solaris links that I’ve found helpfull in setting up a kernel dubegging environment. I’m going to be using it to investigate kernel locks, dtrace and to port the nfs patches. I’ll post a guide on how to set up the mdb (Modular Debugger) under nexenta.
A quick test run
February 17, 2006I installed one of the Fedora Core 5 betas in the lab today. The new installer’s pretty slick, but it doesn’t support an upgrade from Fedora Core 4 currently. I find it’s a bit easier to use, and it seems to be faster even when I’m installing from dvd, instead of the usual network based installation. It took twenty minutes.
We’re using one system in the lab to test it so hopefully we can be ready to roll out FC5 relatively quick, it’s got some new cool features in it and hopefully kickstart is fixed so we can stop using g4u, as thats a bit to slow for me.
Valgrind can be a sledgehammer
February 15, 2006If you are looking to help trace down memory leaks and don’t have a tool that is like valgrind available on your platform, or they seem like overkill if it’s just a small app. A simple solution is to use a global variable and increment it by the number of bytes that you just malloced and then decrement it by the number of bytes that you are about to free. It’s not the most scalable solution but it works for a decent number of smaller programs.
Switched my blog
February 12, 2006Finally switched my blog away from blogger.com. It was a good service but the feeds coming off of it weren’t well formed, and most of my posts ended up looking like crap. I ended up moving to a wordpress.com account. It’s running the latest version of wordpress which is what I’ll be using this summer whne I switch to a dedicated host thats probably running a Xen image.
NFSv4
February 2, 2006A while back Jason did some work on NFSv3 and added some read and write limiting support for the Rapid Recovery paper that we worked on last spring. This support was quite easy to implement for the 2.6.8.1 kernel and also was pretty efficient. I’m currently in the process of adding those features to the 2.6.15.1 kernel and after that I’m going to add the features to NFSv4 for Linux.
I’m also looking to add a couple of features to NFSv3 and 4 throughout the course of this semester. The ones that I’m currently looking to add are.
- Append Only Permissions
- No Execute mount mode
- File size restrictions.
If anyone else has any other ideas for rules, just let me know.
Posted by mccabemt
Posted by mccabemt
Posted by mccabemt