Monday, July 18, 2016

Just when you thought we couldn't take this any further...

ctypes.sh, our quest to build a toolkit for interacting with native code directly from bash scripts, has reached version 1.1. Apart from the standard bug fixes and improvements, the major enhancement in this release is automatic structure support.

Wait, what?
First some background, ctypes.sh is similar to the python ctypes module, but for bash. If you’ve ever wanted to access native libraries in your shell scripts (libm, zlib, gtk+, etc), or use system facilities (poll, select, setitimer, sockets1, pthreads, etc) - and who doesn’t want that - ctypes.sh can make it happen.


ctypes.sh isn’t a script, it’s a plugin -- bash allows you to load new features at runtime via enable -f plugin.so. I know, who knew?


Here’s a fun demo, a port of the GTK+3 Hello World to bash! Notice that we even generate function pointers to bash functions that can be called from native code, so you can provide callbacks!


ctypes.sh takes care of translating between bash and native code, and this works really well for simple data types (int, float, strings, etc). Things can get complicated when you need to use a struct * parameter.


Python solves it the same way ctypes.sh did - the user has to manually translate the structure into a usable form. In Python you create a class with matching members, and in bash you create an array.


That works, but it’s laborious and not much fun.


Starting from ctypes.sh 1.1, most2 of the time we can automatically import structures and create a bash data structure that looks like the native equivalent.


Let’s look at an example, and then I’ll explain how we do it. Here's how you would call stat().


#!/bin/bash
source ctypes.sh

# Define the format of struct stat for bash
struct stat passwd

# Allocate some space for the stat buffer
sizeof -m statbuf stat

# call stat
dlcall stat "/etc/passwd" $statbuf

# Convert result into bash structure
unpack $statbuf passwd

printf "/etc/passwd\n"
printf "\tuid:  %s\n" ${passwd[st_uid]}
printf "\tgid:  %s\n" ${passwd[st_gid]}
printf "\tmode: %o\n" ${passwd[st_mode]##*:}
printf "\tsize: %s\n" ${passwd[st_size]}


(Error checking omitted for clarity, full version here)

All ctypes.sh commands have builtin help (use help struct, for example), and there's a wiki with examples and documentation.

How?
There’s enough data in the compiler debugging data for us to reconstruct the original types, so we parse it and translate it into a format that can be used in bash - It’s surprising how well this works!


$ source ctypes.sh
$ struct itimerval interval
$ echo ${interval[it_value.tv_sec]}

long


In future, we expect to be able to automatically import enums, macros3 and parameter types as well. We’re using the fantastic libdwarves behind the scenes, which provides a convenient api for extracting and parsing DWARF data.


This does mean that dwarf data needs to be available, but this is simple on most platforms. There are more detailed troubleshooting steps available here, but in general:


  • On RedHat or CentOS, try debuginfo-install glibc
  • On Fedora, try dnf debuginfo-install glibc
  • On Debian or Ubuntu, try apt-get install libc6-dbg


An interesting problem we had to solve was that bash stores associative arrays as hash tables, and discards the ordering of elements. You can test this yourself, no matter how you assign elements, the order is forgotten.


$ declare -A hello
$ hello[foo]=1 hello[bar]=2 hello[baz]=3 hello[quz]=4
$ echo ${hello[@]}
2 4 3 1
There is no way of recovering or influencing the order of associative array elements4, so this can’t be used for storing structures which must maintain the order of members.


A quick reminder, a hash table has a fixed number of buckets, each bucket is just a linked list. When you insert a new element, it get’s appended to the list at  table->bucket[hash(key) % nbuckets].


Luckily for us, the bash plugin api allows plugins to set the bucket size used when creating an associative array. So, what happens if we make a test plugin that creates a new associative array with the bucket size set to 1?


   entry = make_new_array_variable("hello");
   entry->value = assoc_create(1); // bucket_size = 1
   entry->attributes = att_assoc;


$ enable -f test.so onebucket
$ onebucket hello
$ hello[foo]=1 hello[bar]=2 hello[baz]=3 hello[quz]=4
$ echo ${hello[@]}
1 2 3 4
All elements get appended to the same linked list, and so the order of elements is maintained!

We use this trick to create associative arrays that remember the order elements were assigned, and can export them back to native structures correctly.

Why?



Because we can.

When?



Right now!


The new features are documented on the wiki, the new release has been made, there are fresh examples in the test directory and the issue tracker is ready to receive any bugs you find.


And of course, we’re eagerly awaiting your mail asking if this is serious.



  1. Yes, bash has some basic builtin support for sockets, hardly comprehensive support.
  2. Some complicated structures might fail, we’re working on it.
  3. Macros are only included in debugging data if you use cc -g3 or similar. Nobody does this because it makes really big binaries, but we have some workarounds planned.
  4. Well, okay, I guess you could brute force a key prefix to influence the hashes or something.

No comments: