One Year of Porting - Post-mortem of two Linux/SteamOS launches

Software

leszek-godlewski
The present document can't read!
Please download to view
44
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
2013 was the year in which Linux finally got the attention of game developers; it was also the year in which my first two Linux/SteamOS ports were released. This talk will cover the learnings of one year of porting work from a programmer's point of view: DOs and DON'Ts and issues both expected and unexpected.
Text
  • One Year of Porting Post-mortem of two Linux/SteamOS launches Leszek Godlewski
  • Who is this guy? Leszek Godlewski Programmer, Nordic Games (early 2014 – now) – Unannounced project Freelance Programmer (Sep 2013 – early 2014) – Linux port of Painkiller Hell & Damnation – Linux port of Deadfall Adventures Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013) – Painkiller Hell & Damnation, Deadfall Adventures
  • Focus ● Not sales figures ● Not business viability ● Not game-specific bugs ● Not the Steam Controller – oops! � ● Platform-specific problems ● Mistakes made & mitigation attempts
  • Agenda ● The ports ● Laying down the foundations – Build system – Compilers – Linking – Boilerplate ● Release and feedback – User issues – Crash handling – GLSL shader linking
  • The ports
  • Painkiller Hell & Damnation (The Farm 51)
  • Deadfall Adventures (The Farm 51)
  • Facts ● Unreal Engine 3 ● All major Linux distros – SteamOS, Debian, Ubuntu, Fedora, Arch, Gentoo ● All official drivers – NVIDIA, AMD, Intel (i965) ● Some open-source drivers – Gallium r600, Gallium radeonsi
  • Facts ● Most of UE3 middlewares have Linux versions – In our case: PhysX, FaceFX, Scaleform Gfx, lzopro, Bink... ● Introduced open-source middlewares – SDL 2.x, GLEW, Steam Runtime ● UE3's build system – Unreal Build Tool – Handles everything make does – Written in C#, fixed up to run in Mono on Linux ● UE3's content packaging (cooking) system – Linux target based on Mac OSX
  • Facts ● QA department unfamiliar with Linux – Basic training was required ● Installing & running software (including from the command line), file permissions, driver installation, gathering system information... – Mostly reported false positives in the beginning ● Spare time project over ~13 months – After leaving The Farm 51 employment – contracted for further outsourcing directly by TF51 – Occasional support from individual members of TF51 staf ● Kudos to Piotr Bąk and Wojciech Knopf!
  • Overlap ● Noticed how a lot of work was based on OSX code? ● Happens all the time – POSIX – OpenGL/OpenAL MacOS X Linux Mobile
  • Agenda ● The ports ● Laying down the foundations – Build system – Compilers – Linking – Boilerplate ● Release and feedback – User issues – Crash handling – GLSL shader linking
  • Laying down the foundations
  • Starting point ● Epic's OpenGL 2.1 and OpenAL back-ends – OpenGL mode somewhat functional in Windows developer builds ● Epic's Mac OSX port – Limited test builds for Mac OSX had been made before – Mac OSX binary builds supported via remote compiltion – Existing Mac OSX target for game content packaging (cooking) ● Both of the above – somewhat... unfinished � ● On Windows, the games shipped 32-bit binaries only
  • Building the build tool – C# & Mono ● Patched the Unreal Build Tool to build & run on Mono in Linux – Mono can handle most .NET commandline apps all right ● Added support for Linux toolchains (duh) ● Fixed hardcoding of backslashes in paths – Path.Join() instead ● Fixed regexes on large strings (C++ sources) blowing up the stack – Break up the string into smaller parts
  • Cross-compiling for 32/64-bit ● Yes, I agree, 32-bit should die, but one may not be allowed to kill it ● gcc -m32/-m64 is not enough! – Sets target code generation – But not headers & libraries (CRT, OpenMP, libgcc etc.) ● Fixed (on Debian & friends) by installing gcc-multilib – Dependency package for non-default architectures (i.e. i386 on an amd64 system and vice versa)
  • Clang ● Clang is faster – gcc: 3m47s – Clang: 3m05s ● Clang has diferent diagnostics than gcc ● Clang has C++ preprocessor macro compatibility with gcc – Declares __GNUC__ etc. ● Clang has commandline compatibility with gcc – Can easily switch back & forth between gcc and Clang
  • Clang - caveats ● Object files may be incompatible with gcc & fail to link (need full rebuilds) ● gcc is more mature than Clang – Clang has generated faulty code for me (YMMV) ● Slight inconsistencies in C++ standard strictness – Templates – Anonymous structs/unions – May need to add this-> in some places – May need to name some anonymous types
  • So – Clang or gcc? Both: ● Clang – quick iterations during development ● gcc – final shipping binaries
  • Linking – GNU ld ● Default linker on Linux ● Ancient ● Single-threaded ● Requires specification of libraries in the order of reverse dependency... ● We are not doomed to use it!
  • Linking – GNU gold ● Multi-threaded linker for ELF binaries – ld: 18s – gold: 5s ● Developed at Google, now officially part of GNU binutils ● Drop-in replacement for ld – May need an additional parameter or toolchain setup ● clang++ -B/usr/lib/gold-ld ... ● g++ -fuse-ld=gold ... ● Still needs libs in the order of reverse dependency...
  • Linking – library groups ● Major headache/game-breaker with circular dependencies – ”Proper” fix: re-specify the same libraries over and over again ● Declare library groups instead – Wrap library list with --start-group –end-group ● Shorthand: -(, -) ● g++ foo.obj -Wl,-( -lA -lB -Wl,-) ● Caveat: results in exhaustive symbol search within the group – Manual warns of possible performance hit – Not observed here, but keep that in mind!
  • Caching the gdb-index ● Large codebase generates heavy debug symbols (hundreds of megabytes) ● gdb generates the index for quick symbol lookup... ● ...at every single gdb startup � – Takes several minutes for said codebases – Massive waste of time! ● Solution: cache the index, fold it into the build process! – Full description in the gdb manual – gdb -batch -ex "save gdb-index $(OUTPUT_PATH)/gdb-index" $ (BINARY) – objcopy --add-section .gdb_index=$(OUTPUT_PATH)/gdb-index/$ (BINARY).gdb-index --set-section-flags .gdb_index=readonly $ (BINARY) $(BINARY)
  • Raw X11 or SDL? ● Initially tried rolling my own boilerplate – Basic X11 mouse, window and key press events are easy – Unicode text input is not – Useful windowing is not – Correct GLX is not – Linux joystick API is not – Above all, X11 seems to be on its way out ● Wayland & Mir will have emulation layers, but that's bound to have overhead ● You really want to use SDL 2 instead, trust me – Shameless plug: see my talk from WGK 2013 for benefits of using SDL 2 ☺
  • Agenda ● The ports ● Laying down the foundations – Build system – Compilers – Linking – Boilerplate ● Release and feedback – User issues – Crash handling – GLSL shader linking
  • Release and feedback
  • What we shipped initially with the beta ● 32-bit binaries (64-bit added later on) ● Launch script (~20 lines) – Architecture detection ● Initially a stub for 64-bit with fallback to 32-bit – Steam Runtime injection (if not already present) ● That's about it ☺ ● Explicit dependency on the Steam Runtime – Allows shifting some responsibility to Valve – And, admittedly, to users who insist on using their own dependencies
  • User issues ● Missing/incompatbile libraries – Resulting from disabling the Steam Runtime ● Gentoo users, mostly... Maintainer of steam package had chosen to disable it by default – Usually fixed by force-starting Steam with STEAM_RUNTIME=1 ● $ STEAM_RUNTIME=1 steam ● ”Missing” 32-bit NVIDIA OpenGL libraries on 64-bit systems – Apparently, they might end up unreachable by the dynamic linker – Fixed by adding /usr/lib32 to LD_LIBRARY_PATH in the launch script – Also, prompt user to make sure they did install them ● It's an option - ”install compatibility 32-bit libraries”
  • User issues ● No support for DXT texture compression despite capable hardware (GL_EXT_texture_compression_s3tc) – Concerns the open-source drivers – For legal reasons (S3/VIA patents), some distros don't ship it or install it automatically ● E.g. Fedora – If extension not advertised by driver, suggest the user to install libtxc_dxtn ● Often a distro package, so no hassle
  • More user issues... ● Graphical glitches... ● Broken V-sync... ● Broken NVIDIA Optimus with open-source multiplexer... ● Looong & unresponsive loading times... ● A whole lot of crashes... ● Most of the above was my fault – not going to bore you with all of this!
  • Crash handling ● Unix signals – Asynchronous IPC notification mechanism in POSIX-compliant systems ● Sources can be the process itself, other processes or the kernel – Default handler terminates process & dumps core for most signals – Can (must?!) specify custom handlers ● Get/set handlers via the sigaction(2) system call – Handler prototype: void sa_handler(int signal, siginfo_t *siginfo, void *context); ● More information – G. Ben-Yossef, Crash N' Burn: Writing Linux application fault handlers
  • Interesting siginfo_t fields ● si_errno – errno value – Possibly more detailed error code ● si_code – reason for sending the signal – Both general and per signal type – Examples: issued by user, issued by kernel, illegal addressing mode, FP over/underflow, invalid memory permissions, unmapped address etc. ● si_addr – memory location at which fault happened – If applicable: SIGILL, SIGFPE, SIGSEGV, SIGBUS and SIGTRAP
  • Signal handler caveats ● Not safe to allocate or free heap memory! – Fault may have corrupted the allocator's data structures ● Prone to race conditions – Can't share locks with the main program! ● If signalled after locking, you'll deadlock – Can't call async-unsafe functions! ● See manual for signal(7) for a list of safe ones ● Custom handlers do not dump core (a.k.a. minidump) – Mitigated by restoring default handler after custom logging and re-signalling self ● signal(signum, SIG_DFL); raise(signum);
  • Safe stack walking ● glibc provides backtrace() and friends ● Symbols are read from the dynamic symbol table – Must pass -rdynamic to gcc/Clang to populate ● Calling backtrace_symbols() allocates heap memory – Not safe... ☹ – Still, can get away with it most of the time – Proper solution involves a separate watchdog process & pipes (heap-less backtrace_symbols_fd() call instead)
  • Long load times? Unresponsiveness? ● Profiling quickly places blame on shader linking – OpenGL shader model operates on program objects, created by linking shader pipeline combinations ● Introduces lots of redundancy (see glGetProgramiv() & glGetShaderiv()) ● Drivers often defer actual compilation until ”link time” ● Increased memory consumption – UE3 OpenGL renderer blocks the render thread for linking ● Render thread blocked → Frozen loading screen! – Both games have thousands of shaders ● An awful lot of vertex/fragment shader combinations (programs) � ● Moreover – makes async level streaming blocking! – Bad stuttering during gameplay ● Situation better on subsequent loads on NVIDIA due to in-driver cache
  • Shader linking ● Short-term fix: background shader linking – Worker thread with a separate OpenGL context, sharing data with the main one – Queue all shader link jobs, execute on the worker only – If on a loading screen, keep spinning it while waiting for the shaders – Defer ”async streaming done” notifications till shader link queue is empty ● Pros: – Quick & easy to implement – Fixes gameplay stuttering ● Cons: – Only fixes unresponsiveness, not the long load times ☹
  • Shader linking ● Disaster on the official AMD Catalyst driver! – Total system hang (PC needs hard reset) – Apparently, exposed a race condition in AMD driver – AMD has yet to ship the fix... ● Fallback to old, blocking code path if Catalyst detected
  • Shader linking ● Possible improvement (suggested by Epic): ARB_separate_shader_objects – Replaces programs (and linking) with much lighter pipeline objects ● Removes a lot of redundancy – Makes use of separate vertex/fragment shaders (D3D-like) – Would play well with UE3's RHI, modelled mostly after D3D ● Not implemented ☹ – requires shader syntax upgrade and a refactor of UE3's OpenGL renderer – Explicit locations for attributes and varyings required for SSO – Need to bump GLSL from 1.20 (OpenGL 2.1) to at least 1.40 (OpenGL 3.1)
  • Shader linking ● Proper fix: deferred shader access – Modern drivers queue shader compiles and links internally and process them in a multithreaded manner ● Official NVIDIA & AMD Catalyst ● Open-source Mesa drivers in SteamOS (patches pushed upstream recently by Valve) – Kick all the jobs (i.e. create shader objects) at level load – Do not access the objects (query, draw) until they are needed – Not even the compile/link status! This creates a sync point! ● Not implemented ☹ – requires a considerable refactor of UE3's OpenGL renderer
  • Summary
  • Takeaway 1/2 ● Porting .NET-based tools to Linux is viable ● Many 32/64-bit cross-compile issues are solved with gcc-multilib ● Switching back and forth between Clang and gcc is easy and useful ● Link times can be greatly improved by using gold ● Caching the gdb-index improves debugging experience ● Using SDL 2 is way better than rolling your own boilerplate
  • Takeaway 2/2 ● Using the Steam Runtime is good for you ● Crash handling in Linux is easy to do, tricky to get right ● OpenGL shader model is significantly diferent from D3D's ● GLSL linking is slow, so defer access if possible ● Multiple concurrent OpenGL contexts can still bite you ● Test on different GPU drivers to avoid unpleasant surprises!
  • @ l go d l ews k i @ n o rd i c ga m e s . at t @ T h e I n e Q u ati o n K w w w. i n e q u ati o n . o rg Questions?
  • F u rt h e r N o rd i c G a m e s i nfo rm ati o n : K w ww. n o rd i c ga m e s . at Deve l o p me nt i nfo rmati o n : K ww w. gr i m l o re ga m e s . co m Thank you!
Comments
Top