One Year of Porting - Post-mortem of two Linux/SteamOS launches


of 44
2013 was the year in which Linux finally got the attention of game developers; it was also the year in which my first two Linux/SteamOS ports were released. This talk will cover the learnings of one year of porting work from a programmer's point of view: DOs and DON'Ts and issues both expected and unexpected.
  • One Year of Porting Post-mortem of two Linux/SteamOS launches Leszek Godlewski
  • Who is this guy? Leszek Godlewski Programmer, Nordic Games (early 2014 – now) – Unannounced project Freelance Programmer (Sep 2013 – early 2014) – Linux port of Painkiller Hell & Damnation – Linux port of Deadfall Adventures Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013) – Painkiller Hell & Damnation, Deadfall Adventures
  • Focus ● Not sales figures ● Not business viability ● Not game-specific bugs ● Not the Steam Controller – oops! � ● Platform-specific problems ● Mistakes made & mitigation attempts
  • Agenda ● The ports ● Laying down the foundations – Build system – Compilers – Linking – Boilerplate ● Release and feedback – User issues – Crash handling – GLSL shader linking
  • The ports
  • Painkiller Hell & Damnation (The Farm 51)
  • Deadfall Adventures (The Farm 51)
  • Facts ● Unreal Engine 3 ● All major Linux distros – SteamOS, Debian, Ubuntu, Fedora, Arch, Gentoo ● All official drivers – NVIDIA, AMD, Intel (i965) ● Some open-source drivers – Gallium r600, Gallium radeonsi
  • Facts ● Most of UE3 middlewares have Linux versions – In our case: PhysX, FaceFX, Scaleform Gfx, lzopro, Bink... ● Introduced open-source middlewares – SDL 2.x, GLEW, Steam Runtime ● UE3's build system – Unreal Build Tool – Handles everything make does – Written in C#, fixed up to run in Mono on Linux ● UE3's content packaging (cooking) system – Linux target based on Mac OSX
  • Facts ● QA department unfamiliar with Linux – Basic training was required ● Installing & running software (including from the command line), file permissions, driver installation, gathering system information... – Mostly reported false positives in the beginning ● Spare time project over ~13 months – After leaving The Farm 51 employment – contracted for further outsourcing directly by TF51 – Occasional support from individual members of TF51 staf ● Kudos to Piotr Bąk and Wojciech Knopf!
  • Overlap ● Noticed how a lot of work was based on OSX code? ● Happens all the time – POSIX – OpenGL/OpenAL MacOS X Linux Mobile
  • Agenda ● The ports ● Laying down the foundations – Build system – Compilers – Linking – Boilerplate ● Release and feedback – User issues – Crash handling – GLSL shader linking
  • Laying down the foundations
  • Starting point ● Epic's OpenGL 2.1 and OpenAL back-ends – OpenGL mode somewhat functional in Windows developer builds ● Epic's Mac OSX port – Limited test builds for Mac OSX had been made before – Mac OSX binary builds supported via remote compiltion – Existing Mac OSX target for game content packaging (cooking) ● Both of the above – somewhat... unfinished � ● On Windows, the games shipped 32-bit binaries only
  • Building the build tool – C# & Mono ● Patched the Unreal Build Tool to build & run on Mono in Linux – Mono can handle most .NET commandline apps all right ● Added support for Linux toolchains (duh) ● Fixed hardcoding of backslashes in paths – Path.Join() instead ● Fixed regexes on large strings (C++ sources) blowing up the stack – Break up the string into smaller parts
  • Cross-compiling for 32/64-bit ● Yes, I agree, 32-bit should die, but one may not be allowed to kill it ● gcc -m32/-m64 is not enough! – Sets target code generation – But not headers & libraries (CRT, OpenMP, libgcc etc.) ● Fixed (on Debian & friends) by installing gcc-multilib – Dependency package for non-default architectures (i.e. i386 on an amd64 system and vice versa)
  • Clang ● Clang is faster – gcc: 3m47s – Clang: 3m05s ● Clang has diferent diagnostics than gcc ● Clang has C++ preprocessor macro compatibility with gcc – Declares __GNUC__ etc. ● Clang has commandline compatibility with gcc – Can easily switch back & forth between gcc and Clang
  • Clang - caveats ● Object files may be incompatible with gcc & fail to link (need full rebuilds) ● gcc is more mature than Clang – Clang has generated faulty code for me (YMMV) ● Slight inconsistencies in C++ standard strictness – Templates – Anonymous structs/unions – May need to add this-> in some places – May need to name some anonymous types
  • So – Clang or gcc? Both: ● Clang – quick iterations during development ● gcc – final shipping binaries
  • Linking – GNU ld ● Default linker on Linux ● Ancient ● Single-threaded ● Requires specification of libraries in the order of reverse dependency... ● We are not doomed to use it!
  • Linking – GNU gold ● Multi-threaded linker for ELF binaries – ld: 18s – gold: 5s ● Developed at Google, now officially part of GNU binutils ● Drop-in replacement for ld – May need an additional parameter or toolchain setup ● clang++ -B/usr/lib/gold-ld ... ● g++ -fuse-ld=gold ... ● Still needs libs in the order of reverse dependency...
  • Linking – library groups ● Major headache/game-breaker with circular dependencies – ”Proper” fix: re-specify the same libraries over and over again ● Declare library groups instead – Wrap library list with --start-group –end-group ● Shorthand: -(, -) ● g++ foo.obj -Wl,-( -lA -lB -Wl,-) ● Caveat: results in exhaustive symbol search within the group – Manual warns of possible performance hit – Not observed here, but keep that in mind!
  • Caching the gdb-index ● Large codebase generates heavy debug symbols (hundreds of megabytes) ● gdb generates the index for quick symbol lookup... ● every single gdb startup � – Takes several minutes for said codebases – Massive waste of time! ● Solution: cache the index, fold it into the build process! – Full description in the gdb manual – gdb -batch -ex "save gdb-index $(OUTPUT_PATH)/gdb-index" $ (BINARY) – objcopy --add-section .gdb_index=$(OUTPUT_PATH)/gdb-index/$ (BINARY).gdb-index --set-section-flags .gdb_index=readonly $ (BINARY) $(BINARY)
  • Raw X11 or SDL? ● Initially tried rolling my own boilerplate – Basic X11 mouse, window and key press events are easy – Unicode text input is not – Useful windowing is not – Correct GLX is not – Linux joystick API is not – Above all, X11 seems to be on its way out ● Wayland & Mir will have emulation layers, but that's bound to have overhead ● You really want to use SDL 2 instead, trust me – Shameless plug: see my talk from WGK 2013 for benefits of using SDL 2 ☺
  • Agenda ● The ports ● Laying down the foundations – Build system – Compilers – Linking – Boilerplate ● Release and feedback – User issues – Crash handling – GLSL shader linking
  • Release and feedback
  • What we shipped initially with the beta ● 32-bit binaries (64-bit added later on) ● Launch script (~20 lines) – Architecture detection ● Initially a stub for 64-bit with fallback to 32-bit – Steam Runtime injection (if not already present) ● That's about it ☺ ● Explicit dependency on the Steam Runtime – Allows shifting some responsibility to Valve – And, admittedly, to users who insist on using their own dependencies
  • User issues ● Missing/incompatbile libraries – Resulting from disabling the Steam Runtime ● Gentoo users, mostly... Maintainer of steam package had chosen to disable it by default – Usually fixed by force-starting Steam with STEAM_RUNTIME=1 ● $ STEAM_RUNTIME=1 steam ● ”Missing” 32-bit NVIDIA OpenGL libraries on 64-bit systems – Apparently, they might end up unreachable by the dynamic linker – Fixed by adding /usr/lib32 to LD_LIBRARY_PATH in the launch script – Also, prompt user to make sure they did install them ● It's an option - ”install compatibility 32-bit libraries”
  • User issues ● No support for DXT texture compression despite capable hardware (GL_EXT_texture_compression_s3tc) – Concerns the open-source drivers – For legal reasons (S3/VIA patents), some distros don't ship it or install it automatically ● E.g. Fedora – If extension not advertised by driver, suggest the user to install libtxc_dxtn ● Often a distro package, so no hassle
  • More user issues... ● Graphical glitches... ● Broken V-sync... ● Broken NVIDIA Optimus with open-source multiplexer... ● Looong & unresponsive loading times... ● A whole lot of crashes... ● Most of the above was my fault – not going to bore you with all of this!
  • Crash handling ● Unix signals – Asynchronous IPC notification mechanism in POSIX-compliant systems ● Sources can be the process itself, other processes or the kernel – Default handler terminates process & dumps core for most signals – Can (must?!) specify custom handlers ● Get/set handlers via the sigaction(2) system call – Handler prototype: void sa_handler(int signal, siginfo_t *siginfo, void *context); ● More information – G. Ben-Yossef, Crash N' Burn: Writing Linux application fault handlers
  • Interesting siginfo_t fields ● si_errno – errno value – Possibly more detailed error code ● si_code – reason for sending the signal – Both general and per signal type – Examples: issued by user, issued by kernel, illegal addressing mode, FP over/underflow, invalid memory permissions, unmapped address etc. ● si_addr – memory location at which fault happened – If applicable: SIGILL, SIGFPE, SIGSEGV, SIGBUS and SIGTRAP
  • Signal handler caveats ● Not safe to allocate or free heap memory! – Fault may have corrupted the allocator's data structures ● Prone to race conditions – Can't share locks with the main program! ● If signalled after locking, you'll deadlock – Can't call async-unsafe functions! ● See manual for signal(7) for a list of safe ones ● Custom handlers do not dump core (a.k.a. minidump) – Mitigated by restoring default handler after custom logging and re-signalling self ● signal(signum, SIG_DFL); raise(signum);
  • Safe stack walking ● glibc provides backtrace() and friends ● Symbols are read from the dynamic symbol table – Must pass -rdynamic to gcc/Clang to populate ● Calling backtrace_symbols() allocates heap memory – Not safe... ☹ – Still, can get away with it most of the time – Proper solution involves a separate watchdog process & pipes (heap-less backtrace_symbols_fd() call instead)
  • Long load times? Unresponsiveness? ● Profiling quickly places blame on shader linking – OpenGL shader model operates on program objects, created by linking shader pipeline combinations ● Introduces lots of redundancy (see glGetProgramiv() & glGetShaderiv()) ● Drivers often defer actual compilation until ”link time” ● Increased memory consumption – UE3 OpenGL renderer blocks the render thread for linking ● Render thread blocked → Frozen loading screen! – Both games have thousands of shaders ● An awful lot of vertex/fragment shader combinations (programs) � ● Moreover – makes async level streaming blocking! – Bad stuttering during gameplay ● Situation better on subsequent loads on NVIDIA due to in-driver cache
  • Shader linking ● Short-term fix: background shader linking – Worker thread with a separate OpenGL context, sharing data with the main one – Queue all shader link jobs, execute on the worker only – If on a loading screen, keep spinning it while waiting for the shaders – Defer ”async streaming done” notifications till shader link queue is empty ● Pros: – Quick & easy to implement – Fixes gameplay stuttering ● Cons: – Only fixes unresponsiveness, not the long load times ☹
  • Shader linking ● Disaster on the official AMD Catalyst driver! – Total system hang (PC needs hard reset) – Apparently, exposed a race condition in AMD driver – AMD has yet to ship the fix... ● Fallback to old, blocking code path if Catalyst detected
  • Shader linking ● Possible improvement (suggested by Epic): ARB_separate_shader_objects – Replaces programs (and linking) with much lighter pipeline objects ● Removes a lot of redundancy – Makes use of separate vertex/fragment shaders (D3D-like) – Would play well with UE3's RHI, modelled mostly after D3D ● Not implemented ☹ – requires shader syntax upgrade and a refactor of UE3's OpenGL renderer – Explicit locations for attributes and varyings required for SSO – Need to bump GLSL from 1.20 (OpenGL 2.1) to at least 1.40 (OpenGL 3.1)
  • Shader linking ● Proper fix: deferred shader access – Modern drivers queue shader compiles and links internally and process them in a multithreaded manner ● Official NVIDIA & AMD Catalyst ● Open-source Mesa drivers in SteamOS (patches pushed upstream recently by Valve) – Kick all the jobs (i.e. create shader objects) at level load – Do not access the objects (query, draw) until they are needed – Not even the compile/link status! This creates a sync point! ● Not implemented ☹ – requires a considerable refactor of UE3's OpenGL renderer
  • Summary
  • Takeaway 1/2 ● Porting .NET-based tools to Linux is viable ● Many 32/64-bit cross-compile issues are solved with gcc-multilib ● Switching back and forth between Clang and gcc is easy and useful ● Link times can be greatly improved by using gold ● Caching the gdb-index improves debugging experience ● Using SDL 2 is way better than rolling your own boilerplate
  • Takeaway 2/2 ● Using the Steam Runtime is good for you ● Crash handling in Linux is easy to do, tricky to get right ● OpenGL shader model is significantly diferent from D3D's ● GLSL linking is slow, so defer access if possible ● Multiple concurrent OpenGL contexts can still bite you ● Test on different GPU drivers to avoid unpleasant surprises!
  • @ l go d l ews k i @ n o rd i c ga m e s . at t @ T h e I n e Q u ati o n K w w w. i n e q u ati o n . o rg Questions?
  • F u rt h e r N o rd i c G a m e s i nfo rm ati o n : K w ww. n o rd i c ga m e s . at Deve l o p me nt i nfo rmati o n : K ww w. gr i m l o re ga m e s . co m Thank you!