Introduction | Variables are sneaky. Sometimes they'll happily sit in the register, only to end up on the stack as soon as they turn around. For optimization purposes, the compiler may throw them out of the window entirely. No matter how variables move through memory, we need some way to track and manipulate them in the debugger. This article will teach you how to handle variables in the debugger and demonstrate a simple implementation using libelfin. |
Before starting, please make sure you are using the version of libelfin fbreg on my branch. This contains a few hacks to support getting the base address of the current stack frame and evaluating a list of locations, neither of which is provided by native libelfin. You may need to pass the -gdwarf-2 parameter to GCC to generate compatible DWARF messages. But before implementing that, I'll detail how positional encoding works in the latest DWARF 5 specification. If you want to know more, you can get the standard here.
DWARF LocationThe location of a variable in memory at a given moment is encoded in the DWARF message using the DW_AT_location attribute. A location description can be a single location description, a composite location description, or a list of locations.
DW_AT_location is encoded in three different ways depending on the type of location description. exprloc encodes simple and composite position descriptions. They consist of a byte length followed by a DWARF expression or location description. Encoded location lists for loclist and loclistptr, which provide the index or offset in the .debug_loclists section, which describes the actual location list.
DWARF expressionUse a DWARF expression to calculate the actual position of a variable. This includes a series of operations that manipulate stack values. There are many DWARF operations available, so I won't explain them in detail. Instead, I'll give some examples from each expression to give you something to work with. Also, don't be afraid of this; libelfin will handle all this complexity for us.
DWARF type representations need to be powerful enough to provide useful variable representations to debugger users. Users often want to be able to debug at the application level rather than at the machine level, and they need to understand what their variables are doing.
The DWARF type is encoded in DIE along with most other debugging information. They can have properties indicating their name, encoding, size, bytes, etc. A myriad of type tags are available to represent pointers, arrays, structures, typedefs, and anything else you might see in C or a C program.
Take this simple structure as an example:
struct test{ int i; float j; int k[42]; test* next; };
The parent DIE of this structure is like this:
< 1><0x0000002a> DW_TAG_structure_type DW_AT_name "test" DW_AT_byte_size 0x000000b8 DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000001
The above is that we have a structure called test, with a size of 0xb8, declared on line 1 of test.cpp. Next there are a number of sub-DIEs describing the members.
< 2><0x00000032> DW_TAG_member DW_AT_name "i" DW_AT_type <0x00000063> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000002 DW_AT_data_member_location 0 < 2><0x0000003e> DW_TAG_member DW_AT_name "j" DW_AT_type <0x0000006a> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000003 DW_AT_data_member_location 4 < 2><0x0000004a> DW_TAG_member DW_AT_name "k" DW_AT_type <0x00000071> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000004 DW_AT_data_member_location 8 < 2><0x00000056> DW_TAG_member DW_AT_name "next" DW_AT_type <0x00000084> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000005 DW_AT_data_member_location 176(as signed = -80)
Each member has a name, a type (which is a DIE offset), a declaration file and line, and a byte offset pointing to the structure in which its member resides. Its type points are as follows.
< 1><0x00000063> DW_TAG_base_type DW_AT_name "int" DW_AT_encoding DW_ATE_signed DW_AT_byte_size 0x00000004 < 1><0x0000006a> DW_TAG_base_type DW_AT_name "float" DW_AT_encoding DW_ATE_float DW_AT_byte_size 0x00000004 < 1><0x00000071> DW_TAG_array_type DW_AT_type <0x00000063> < 2><0x00000076> DW_TAG_subrange_type DW_AT_type <0x0000007d> DW_AT_count 0x0000002a < 1><0x0000007d> DW_TAG_base_type DW_AT_name "sizetype" DW_AT_byte_size 0x00000008 DW_AT_encoding DW_ATE_unsigned < 1><0x00000084> DW_TAG_pointer_type DW_AT_type <0x0000002a>
As you can see, int on my laptop is a 4-byte signed integer type, and float is a 4-byte floating point number. The integer array type has 2a elements by pointing to type int as its element type and sizetype (think of it as size_t) as the index type. The test * type is DW_TAG_pointer_type, which refers to the test DIE.
Implementing a simple variable readerAs mentioned above, libelfin will handle most of the complexity for us. However, it does not implement all methods for representing variable positions, and handling these in our code will become very complex. Therefore, I now choose to only support exprloc. Please add support for more types of expressions as needed. If you're really brave, please submit a patch to libelfin to help complete the necessary support!
Processing variables mainly involves locating different parts in memory or registers, and reading or writing is the same as before. To keep things simple, I'll just tell you how to implement reading.
First we need to tell libelfin how to read registers from our process. We create a class that inherits from expr_context and use ptrace to handle everything:
class ptrace_expr_context : public dwarf::expr_context { public: ptrace_expr_context (pid_t pid) : m_pid{pid} {} dwarf::taddr reg (unsigned regnum) override { return get_register_value_from_dwarf_register(m_pid, regnum); } dwarf::taddr pc() override { struct user_regs_struct regs; ptrace(PTRACE_GETREGS, m_pid, nullptr, ®s); return regs.rip; } dwarf::taddr deref_size (dwarf::taddr address, unsigned size) override { //TODO take into account size return ptrace(PTRACE_PEEKDATA, m_pid, address, nullptr); } private: pid_t m_pid; };
Reading will be handled by the read_variables function in our debugger class:
void debugger::read_variables() { using namespace dwarf; auto func = get_function_from_pc(get_pc()); //... }
The first thing we did above is find the function we are currently in, then we need to iterate through the entries in that function to find the variables:
for (const auto& die : func) { if (die.tag == DW_TAG::variable) { //... } }
We obtain location information by looking for the DW_AT_location entry in DIE:
auto loc_val = die[DW_AT::location];
Next we make sure it's an exprloc and ask libelfin to evaluate our expression:
if (loc_val.get_type() == value::type::exprloc) { ptrace_expr_context context {m_pid}; auto result = loc_val.as_exprloc().evaluate(&context);
Now that we have evaluated the expression, we need to read the contents of the variable. It can be in memory or registers, so we'll handle both cases:
switch (result.location_type) { case expr_result::type::address: { auto value = read_memory(result.value); std::cout << at_name(die) << " (0x" << std::hex << result.value << ") = " << value << std::endl; break; } case expr_result::type::reg: { auto value = get_register_value_from_dwarf_register(m_pid, result.value); std::cout << at_name(die) << " (reg " << result.value << ") = " << value << std::endl; break; } default: throw std::runtime_error{"Unhandled variable location"}; }
You can see that I printed out the value without explanation based on the type of the variable. Hopefully with this code you can see how there is support for writing variables, or searching for variables with a given name.
Finally we can add this to our command parser:
else if(is_prefix(command, "variables")) { read_variables(); }
Write some small function with some variables, compile it without optimization and with debug information, and then see if you can read the value of the variable. Try writing to the memory address where the variable is stored and see how the program changes behavior.
There are already nine articles, and the last one is left! Next time I will discuss some more advanced concepts that may be of interest to you. Now you can find the code for this post here.
The above is the detailed content of Explore variable handling techniques in Linux debuggers!. For more information, please follow other related articles on the PHP Chinese website!