You are aware of our statements indicating “Microsoft Loves Linux.” Over the last couple of years the open source activities at Microsoft have accelerated all around me. Recently I blogged about the design of of thedebugger bridgeand the use of LLDB. In this post I want to highlight my recent debugging session into LLDB and show you how we are contributing to LLDB.
We captured a core dump on a 2TB RAM system. When loading it with LLDB the core dump would open but we could not symbolize stacks or see specific memory that was expected to be present in the dump. Opening the dump with gdb allowed us to see the memory region and dumping the PT_LOAD headers with readelf showed the memory region was present in the dump.
I started with LLDB logging. LLDB has logging capabilities to assist in understanding what is transpiring.
You can use the command “log list” to see thechannel(gdb-remote, LLDB, …) and supported loggingcategories. Using the-O command line parameter,when you start LLDB, indicates to LLDB to execute the command before the files are loaded. This lets you log information while loading the core dump.
LLDB loads the target and associated shared libraries (I.E. DLLs) during the target create action.
In the log you can see LLDB is setting the path to that of the specified executable and using that path to look for shared libraries (.so files.)
The rendezvous logic is where LLDB attempts to find information in the dump overlapping the shared modules. When I stepped through the logic theDYLDis the key. DYLD (dynamic loading) is where LLDB is using the core dump information (named memory regions such as [vdso]) to locate the dynamic object (list maps) and other information in an attempt to obtain the shared module details.
You can see LLDB reading over the sections by name but LLDB terminates the search early and can’t find the symbol sections. Without the symbol sections variables and stack decoding is limited.
… SectionLoadList::SetSectionLoadAddress (section = 0x5642dc4a62e0 (./bin/sqlservr.dbg..bss), load_addr = 0x00007f5e4bfb5e80) module = 0x5642dc4919b0 DYLDRendezvous::Resolve address size: 8, padding 4 ResolveRendezvousAddress info_location = 0x7f5e4bfb4dc0 ResolveRendezvousAddress reading pointer (8 bytes) from 0x7f5e4bfb4dc0 ResolveRendezvousAddress FAILED – could not read from the info location: core file does not contain 0x7f5e4bfb4dc0 DYLDRendezvous::Resolve cursor = 0xffffffffffffffff DynamicLoaderPOSIXDYLD::DidAttach() pid 8917 rendezvous could not yet resolve: adding breakpoint to catch future rendezvous setup DynamicLoaderPOSIXDYLD::ProbeEntry pid 8917 GetEntryPoint() returned address 0x7f5e4bd2d83c, setting entry breakpoint DynamicLoaderPOSIXDYLD::DidAttach told the target about the modules that loaded: — [module] ./bin/sqlservr.dbg (pid 8917) Core file‘/media/rdorr/Scratch1/Temp/CoreWontLoad/core.sqlservr.8917’ (x86_64) was loaded.
Why is the read failing?
Taking a closer look at the log and stepping though the LLDB code I found the read failure occurring because the memory address could not be located in the tracked VM map. The VM map is built from the PT_LOAD entries in the core dump file. PT_LOAD is an entry containing the address range, permissions and offset in the dump where the memory region is stored.
I turned toreadelf –hto dump the ELF header associated with the core dump file.
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2’s complement, little endian Version: 1 (current) OS/ABI: UNIX – System V ABI Version: 0 Type: CORE (Core file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x0 Start of program headers: 64 (bytes into file) Start of section headers: 16880770624 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 65535 (106714) Size of section headers: 64 (bytes) Number of section headers: 0 (106716) Section header string table index:65535 (106715)
Interestingly, I found the program_headers printed65535 (106714). What this tells us is the ELF header has a special marker (0xFFFF=65535.) When this marker is placed in the program and section header count members, it indicates that section_header holds the actual count values. The ELF header is limited to a 16 bit value or max of 0xFFFF and on the 2TB system the number of memory regions exceeded 65535 when the dump was captured.
You can use readelf –l or –S to see the program/section header information as well as dump the PT_LOAD. Using this technique I was able to confirm the PT_LOAD entries existed for the region in which the read was returning failure on. I could also deduce that the PT_LOAD entry for this region was beyond the 65535th entry in the core dump.
LOAD 0x00000003ee293e280x00007f5e4bfb10000x0000000000000000 0x0000000000004000 0x0000000000004000 R 1
ResolveRendezvousAddress FAILED – could not read from the info location: core file does not contain0x7f5e4bfb4dc0
With a bit more stepping I found that LLDB’s, load core logic is looping over the program headers and building up the VM map. The loop is controlled by the ELF header value, in this case 65535. LLDB didn’t have logic to detect the 0xFFFF signature and read the section_header, at the ‘Start of section headers’ offset, in order to obtain the actual counts. The gdb and readelf utilities read the proper value of (106714) but LLDB used 65535. Thus, LLDB is not reading the available PT_LOAD headers and the region of memory needed appears after the 65535th entry.
In parallel I made changes to an internal utility (future blog to come) and my esteemed collogue Eugene, who did the majority of heavy lifting for dbgbridge, made changes to LLDB code. Once Eugene allowed LLDB to read all program and section headers we are able to load and properly debug the large, core dump. The change is now under review so it can be added back to the LLDB code base.
Along with this contribution we have uncovered a few other issues, provided possible fixes, and shared these with the LLDB community. The lldb-dev community has been very responsive to engage and help address issues. This is really a neat experience and I can truly see our use of LLDB driving more advancements for everyone.
FEB 1, 2017 UPDATE – CHANGE ACCEPTED
This revision was automatically updated to reflect the committed changes.
Commit rL293714: Open ELF core dumps with more than 64K sections