OpenPKG: 32bit vs. 64bit

Lessons learned from attempting to create a pure 64bit OpenPKG instance on a system made from mixed 32bit/64bit parts. OpenPKG builds and runs fine in any “pure” environment where parts are either all 32bit or all 64bit. Problems and limitations arise from any attempt to mix 32/64.

The involved parts which can be 32/64bit are:

  • CPU
  • kernel
  • glibc
  • toolchain (binutils, gcc)
  • headers (/usr/include)

Facts

  1. a 32bit CPU cannot run any 64bit code natively (requires H/W emulation).
  2. a 32bit kernel will not execute 64bit code.
  3. a 32bit OS installation, even when run by a 64bit kernel on a 64bit CPU, requires additional 64bit glibc installed for 64bit ABI compatibility to execute 64bit applications.
  4. to build 64bit code you need a 64bit toolchain (binutils/gcc). Even valid 64bit code might result in broken applications if their data structures are based on 32bit header API thus sized for 32bit ABI.
  5. to build 64bit applications that work with 64bit ABI, 64bit API headers are required.

Note 1+2+3 is the minimum requirement to run 64bit application binaries that were created on another pure 64bit machine. The developer requires a pure 64bit machine (or take the crosscompile route).

The concrete situation being evaluated was a pure 32bit Debian 4.0, hardisks moved to a more powerful 64bit CPU hardware and 64bit Debian 4.0 kernel installed and run.

OpenPKG failed to build it’s toolchain. The gcc detected a 64bit environment based on the 64bit kernel running but the support was incomplete and the build failed. It was assumed the problem is with binutils. Various attempts to force OpenPKG binutils and gcc into 64bit failed.

Created a OpenPKG 64bit toolchain on another pure 64bit machine. Worked fine there. The same binaries were deployed to the mixed machine then. This toolchain was able to create valid 64bit code after the mixed machine got 64bit glibc installed. (Proof of 1+2+3=minimum) Generation and running simple “Hello, World!” applications worked fine.

However, almost every practical application failed to run. Examined gzip, which was the first to break in the OpenPKG bootstrap process. The code was valid 64bit but the application misbehaved. Interesting issue was that the error was compiled into the application, because our pure 64bit machine exhibited the same problem when running the broken application imported from the mixed machine. The problem was tracked down to the “struct stat” which comes from the /usr/include headers. On 32bit Debian 4.0 this structure was different from the 64bit Debian 4.0, showing that 32/64bit Debian is not just the same OS recompiled – they are different from the inside out. A solution would probably be using 64bit headers but this will surly break the 32bit parts of the OS. So a practical and clean solution, as described in the very first paragraph, is to go pure 32bit or pure 64bit for building applications. It should be possible to run those 64bit applications on a mixed system with minimal compatibility, as described above.

The appropriate ABI (Application Binary Interface) must be supplied to run binaries. 32bit applications must call the 32bit ABI and 64bit applications must call the 64bit ABI. The switch is done by the 64bit kernel (loader) but a 32bit OS installation is missing 64bit ABI. It is optional and must be installed -> on Debian 4.0 use “apt-get glibc-64″.

Developers building binaries must compile their applications against the API (Application Programming Interface) and the appropriate headers are required to describe the data structures for that ABI. It might be possible to tweak the toolchain to include the correct headers but this is not easily achievable (it can be done, as crosscompiling prooves) and the developer will quickly find himself in a situation where pure 32/64bit is the practical way to go.

The following application, when build on a 64bit machine with wrong (32) headers, will incorrectly show sizeof(sb.st_nlink) being 4 where it must be 8 for the 64bit ABI. The reason for checking the hardlinks is because gzip failed compressing a file for reason like “has 49999 other hardlinks”. It turns out that it grabbed crap from the st_nlink data structure.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <unistd.h>

int main (int argc, char *argv[])
{
    struct stat sb;
    int rc;
    printf("INFO: sizeof(sb)=%d sizeof(sb.st_nlink)=%d\n",
           sizeof(sb), sizeof(sb.st_nlink));
    if ((rc = stat("hello.c", &sb)) == -1) {
        printf("ERROR: %s\n", strerror(rc));
        exit(0);
    }
    printf("INFO: hello.c has %d hard-links\n", sb.st_nlink);
    printf("Hello, World!\n");
    return 0;
}

Initial state with CPU+kernel=64, glib+toolchain+headers=32

# uname -a
Linux foo 2.6.18-4-amd64bit #1 SMP Thu May 10 01:01:58 UTC 2007 x86_64bit GNU/Linux
# as --version
GNU assembler 2.17 Debian GNU/Linux [...]
This assembler was configured for a target of `i486-linux-gnu'.
# gcc hello.c
# gcc -m32bit hello.c
# gcc -m64bit hello.c
/usr/bin/ld: cannot find -lgcc_s
collect2: ld returned 1 exit status

using 64bit toolchain built on and imported from another pure 64bit machine

# /openpkg/bin/gcc -m64 -o hello hello.c
# ./hello
-bash: ./a.out: cannot execute binary file

Install 64bit glibc compatibility

# apt-get install libc6-amd64 libc6-dev-amd64

build on a pure 64bit machine, run on either pure or mixed machine

# /openpkg/bin/gcc -o hello hello.c
# ln hello.c foo; ln hello.c bar; ./hello
INFO: sizeof(sb)=144 sizeof(sb.st_nlink)=8
INFO: hello.c has 3 hard-links
Hello, World!

build and run on the mixed machine run on either pure or mixed machine

# /openpkg/bin/gcc -o hello hello.c
# ./hello
INFO: sizeof(sb)=144 sizeof(sb.st_nlink)=4
INFO: hello.c has 0 hard-links
Hello, World!

Because the mixed machine had 32bit headers, the data structure “sb.st_nlink” was configured to 32bit ABI, breaking the 64bit application which is routed to use the 64bit ABI.

Leave a Reply