Home > System Tutorial > LINUX > In-depth exploration of source code-level breakpoint technology in Linux debuggers!

In-depth exploration of source code-level breakpoint technology in Linux debuggers!

WBOY
Release: 2024-01-01 21:59:42
forward
918 people have browsed it
Introduction Setting breakpoints on memory addresses is nice, but it doesn't provide the most user-friendly tool. We want to be able to set breakpoints on source code lines and function entry addresses so that we can debug at the same level of abstraction as the code.

This article will add source-level breakpoints to our debugger. With all the features we already support, this is much easier than it initially sounds. We will also add a command to get the type and address of a symbol, which is useful for locating code or data and understanding linking concepts.

Series Index

As subsequent articles are published, these links will gradually take effect.

  1. Preparing the environment
  2. Breakpoint
  3. Registers and Memory
  4. Elves and dwarves
  5. Source code and signals
  6. Source code level step by step execution
  7. Source level breakpoints
  8. Call stack
  9. Read variables
  10. Next steps
Breakpoint DWARF

Elves and dwarves This article describes how DWARF debugging information works and how it can be used to map machine code to high-level source code. Recall that DWARF contains the address range of a function and a line table that allows you to translate code locations between abstraction layers. We will use these functions to implement our breakpoints.

Function entry

Setting breakpoints on function names can be a bit complicated if you consider overloading, member functions, etc., but we will go through all compilation units and search for functions that match the name we are looking for. The DWARF information looks like this:

< 0><0x0000000b>  DW_TAG_compile_unit
                    DW_AT_producer              clang version 3.9.1 (tags/RELEASE_391/final)
                    DW_AT_language              DW_LANG_C_plus_plus
                    DW_AT_name                  /super/secret/path/MiniDbg/examples/variable.cpp
                    DW_AT_stmt_list             0x00000000
                    DW_AT_comp_dir              /super/secret/path/MiniDbg/build
                    DW_AT_low_pc                0x00400670
                    DW_AT_high_pc               0x0040069c

LOCAL_SYMBOLS:
< 1><0x0000002e>    DW_TAG_subprogram
                      DW_AT_low_pc                0x00400670
                      DW_AT_high_pc               0x0040069c
                      DW_AT_name                  foo
                      ...
...
<14><0x000000b0>    DW_TAG_subprogram
                      DW_AT_low_pc                0x00400700
                      DW_AT_high_pc               0x004007a0
                      DW_AT_name                  bar
                      ...

Copy after login

We want to match DW_AT_name and use DW_AT_low_pc (the starting address of the function) to set our breakpoint.

void debugger::set_breakpoint_at_function(const std::string& name) {
for (const auto& cu : m_dwarf.compilation_units()) {
for (const auto& die : cu.root()) {
if (die.has(dwarf::DW_AT::name) && at_name(die) == name) {
auto low_pc = at_low_pc(die);
auto entry = get_line_entry_from_pc(low_pc);
++entry; //skip prologue
set_breakpoint_at_address(entry->address);
}
}
}
}
Copy after login

The only thing that looks a little strange about this code is the entry. The problem is that the function's DW_AT_low_pc does not point to the start address of the function's user code, it points to the start of the prologue. The compiler usually outputs a function's prologue and epilogue, which are used to perform saving and restoring the stack, manipulating the stack pointer, etc. This isn't very useful to us, so we increment the entry line by one to get the first line of user code instead of the prologue. The DWARF line table actually has some functionality for marking the entry as the first line after the function prologue, but not all compilers output it, so I went with the original approach.

Source code lines

To set a breakpoint on a high-level source code line, we need to convert the line number into an address in DWARF. We will iterate through the compilation units, looking for one whose name matches the given file, and then look for the entry corresponding to the given line.

DWARF looks a bit like this:

.debug_line: line number info for a single cu
Source lines (from CU-DIE at .debug_info offset 0x0000000b):
NS new statement, BB new basic block, ET end of text sequence
PE prologue end, EB epilogue begin
IS=val ISA number, DI=val discriminator value
[lno,col] NS BB ET PE EB IS= DI= uri: "filepath"
0x004004a7 [ 1, 0] NS uri: "/super/secret/path/a.hpp"
0x004004ab [ 2, 0] NS
0x004004b2 [ 3, 0] NS
0x004004b9 [ 4, 0] NS
0x004004c1 [ 5, 0] NS
0x004004c3 [ 1, 0] NS uri: "/super/secret/path/b.hpp"
0x004004c7 [ 2, 0] NS
0x004004ce [ 3, 0] NS
0x004004d5 [ 4, 0] NS
0x004004dd [ 5, 0] NS
0x004004df [ 4, 0] NS uri: "/super/secret/path/ab.cpp"
0x004004e3 [ 5, 0] NS
0x004004e8 [ 6, 0] NS
0x004004ed [ 7, 0] NS
0x004004f4 [ 7, 0] NS ET
Copy after login

So if we wanted to set a breakpoint on line 5 of ab.cpp, we would look for the entry associated with line (0x004004e3) and set a breakpoint.

void debugger::set_breakpoint_at_source_line(const std::string& file, unsigned line) {
for (const auto& cu : m_dwarf.compilation_units()) {
if (is_suffix(file, at_name(cu.root()))) {
const auto& lt = cu.get_line_table();
for (const auto& entry : lt) {
if (entry.is_stmt && entry.line == line) {
set_breakpoint_at_address(entry.address);
return;
}
}
}
}
}
Copy after login

I did the is_suffix hack here so you can type c.cpp for a/b/c.cpp. Of course you should actually use a case-sensitive path handling library or something, but I'm lazy. entry.is_stmt checks whether the line table entry is marked as the beginning of a statement, which is set by the compiler based on the address it thinks is the best target for the breakpoint.

Symbol search

When we are at the object file level, symbols are king. Functions are named with symbols, global variables are named with symbols, you get a symbol, we get a symbol, everyone gets a symbol. In a given object file, some symbols may reference other object files or shared libraries, and the linker will create an executable program from the symbol references.

Symbols can be looked up in a properly named symbol table, which is stored in the ELF section of the binary file. Fortunately, libelfin has a nice interface for doing this, so we don't need to handle all the ELF stuff ourselves. To give you an idea of ​​what we're dealing with, here's a dump of the .symtab portion of the binary, generated by readelf:

Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400238 0 SECTION LOCAL DEFAULT 1
2: 0000000000400254 0 SECTION LOCAL DEFAULT 2
3: 0000000000400278 0 SECTION LOCAL DEFAULT 3
4: 00000000004002c8 0 SECTION LOCAL DEFAULT 4
5: 0000000000400430 0 SECTION LOCAL DEFAULT 5
6: 00000000004004e4 0 SECTION LOCAL DEFAULT 6
7: 0000000000400508 0 SECTION LOCAL DEFAULT 7
8: 0000000000400528 0 SECTION LOCAL DEFAULT 8
9: 0000000000400558 0 SECTION LOCAL DEFAULT 9
10: 0000000000400570 0 SECTION LOCAL DEFAULT 10
11: 0000000000400714 0 SECTION LOCAL DEFAULT 11
12: 0000000000400720 0 SECTION LOCAL DEFAULT 12
13: 0000000000400724 0 SECTION LOCAL DEFAULT 13
14: 0000000000400750 0 SECTION LOCAL DEFAULT 14
15: 0000000000600e18 0 SECTION LOCAL DEFAULT 15
16: 0000000000600e20 0 SECTION LOCAL DEFAULT 16
17: 0000000000600e28 0 SECTION LOCAL DEFAULT 17
18: 0000000000600e30 0 SECTION LOCAL DEFAULT 18
19: 0000000000600ff0 0 SECTION LOCAL DEFAULT 19
20: 0000000000601000 0 SECTION LOCAL DEFAULT 20
21: 0000000000601018 0 SECTION LOCAL DEFAULT 21
22: 0000000000601028 0 SECTION LOCAL DEFAULT 22
23: 0000000000000000 0 SECTION LOCAL DEFAULT 23
24: 0000000000000000 0 SECTION LOCAL DEFAULT 24
25: 0000000000000000 0 SECTION LOCAL DEFAULT 25
26: 0000000000000000 0 SECTION LOCAL DEFAULT 26
27: 0000000000000000 0 SECTION LOCAL DEFAULT 27
28: 0000000000000000 0 SECTION LOCAL DEFAULT 28
29: 0000000000000000 0 SECTION LOCAL DEFAULT 29
30: 0000000000000000 0 SECTION LOCAL DEFAULT 30
31: 0000000000000000 0 FILE LOCAL DEFAULT ABS init.c
32: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
33: 0000000000600e28 0 OBJECT LOCAL DEFAULT 17 __JCR_LIST__
34: 00000000004005a0 0 FUNC LOCAL DEFAULT 10 deregister_tm_clones
35: 00000000004005e0 0 FUNC LOCAL DEFAULT 10 register_tm_clones
36: 0000000000400620 0 FUNC LOCAL DEFAULT 10 __do_global_dtors_aux
37: 0000000000601028 1 OBJECT LOCAL DEFAULT 22 completed.6917
38: 0000000000600e20 0 OBJECT LOCAL DEFAULT 16 __do_global_dtors_aux_fin
39: 0000000000400640 0 FUNC LOCAL DEFAULT 10 frame_dummy
40: 0000000000600e18 0 OBJECT LOCAL DEFAULT 15 __frame_dummy_init_array_
41: 0000000000000000 0 FILE LOCAL DEFAULT ABS /super/secret/path/MiniDbg/
42: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
43: 0000000000400818 0 OBJECT LOCAL DEFAULT 14 __FRAME_END__
44: 0000000000600e28 0 OBJECT LOCAL DEFAULT 17 __JCR_END__
45: 0000000000000000 0 FILE LOCAL DEFAULT ABS
46: 0000000000400724 0 NOTYPE LOCAL DEFAULT 13 __GNU_EH_FRAME_HDR
47: 0000000000601000 0 OBJECT LOCAL DEFAULT 20 _GLOBAL_OFFSET_TABLE_
48: 0000000000601028 0 OBJECT LOCAL DEFAULT 21 __TMC_END__
49: 0000000000601020 0 OBJECT LOCAL DEFAULT 21 __dso_handle
50: 0000000000600e20 0 NOTYPE LOCAL DEFAULT 15 __init_array_end
51: 0000000000600e18 0 NOTYPE LOCAL DEFAULT 15 __init_array_start
52: 0000000000600e30 0 OBJECT LOCAL DEFAULT 18 _DYNAMIC
53: 0000000000601018 0 NOTYPE WEAK DEFAULT 21 data_start
54: 0000000000400710 2 FUNC GLOBAL DEFAULT 10 __libc_csu_fini
55: 0000000000400570 43 FUNC GLOBAL DEFAULT 10 _start
56: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
57: 0000000000400714 0 FUNC GLOBAL DEFAULT 11 _fini
58: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_
59: 0000000000400720 4 OBJECT GLOBAL DEFAULT 12 _IO_stdin_used
60: 0000000000601018 0 NOTYPE GLOBAL DEFAULT 21 __data_start
61: 00000000004006a0 101 FUNC GLOBAL DEFAULT 10 __libc_csu_init
62: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 22 __bss_start
63: 0000000000601030 0 NOTYPE GLOBAL DEFAULT 22 _end
64: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 21 _edata
65: 0000000000400670 44 FUNC GLOBAL DEFAULT 10 main
66: 0000000000400558 0 FUNC GLOBAL DEFAULT 9 _init
Copy after login

You can see a lot of symbols used to set up the environment in the object file, and finally you can see the main symbol.

We are interested in the symbol's type, name and value (address). We have a symbol_type enum of that type, and use a std::string as the name and std::uintptr_t as the address:

enum class symbol_type {
notype, // No type (e.g., absolute symbol)
object, // Data object
func, // Function entry point
section, // Symbol is associated with a section
file, // Source file associated with the
}; // object file
std::string to_string (symbol_type st) {
switch (st) {
case symbol_type::notype: return "notype";
case symbol_type::object: return "object";
case symbol_type::func: return "func";
case symbol_type::section: return "section";
case symbol_type::file: return "file";
}
}
struct symbol {
symbol_type type;
std::string name;
std::uintptr_t addr;
};
Copy after login

We need to map the symbol type we get from libelfin to our enum, because we don't want dependencies to break this interface. Luckily, I picked the same name for everything, so it was easy:

symbol_type to_symbol_type(elf::stt sym) {
switch (sym) {
case elf::stt::notype: return symbol_type::notype;
case elf::stt::object: return symbol_type::object;
case elf::stt::func: return symbol_type::func;
case elf::stt::section: return symbol_type::section;
case elf::stt::file: return symbol_type::file;
default: return symbol_type::notype;
}
};
Copy after login

Finally we have to find the symbol. For illustration purposes, I loop through the ELF portion of the symbol table and collect any symbols I find there into a std::vector. A smarter implementation could establish a mapping from name to symbol so that you only have to look at the data once.

std::vector debugger::lookup_symbol(const std::string& name) {
std::vector syms;
for (auto &sec : m_elf.sections()) {
if (sec.get_hdr().type != elf::sht::symtab && sec.get_hdr().type != elf::sht::dynsym)
continue;
for (auto sym : sec.as_symtab()) {
if (sym.get_name() == name) {
auto &d = sym.get_data();
syms.push_back(symbol{to_symbol_type(d.type()), sym.get_name(), d.value});
}
}
}
return syms;
}
Copy after login
添加命令

一如往常,我们需要添加一些更多的命令来向用户暴露功能。对于断点,我使用 GDB 风格的接口,其中断点类型是通过你传递的参数推断的,而不用要求显式切换:

  • 0x -> 断点地址
  • : -> 断点行号
  •  -> 断点函数名
else if(is_prefix(command, "break")) {
if (args[1][0] == '0' && args[1][1] == 'x') {
std::string addr {args[1], 2};
set_breakpoint_at_address(std::stol(addr, 0, 16));
}
else if (args[1].find(':') != std::string::npos) {
auto file_and_line = split(args[1], ':');
set_breakpoint_at_source_line(file_and_line[0], std::stoi(file_and_line[1]));
}
else {
set_breakpoint_at_function(args[1]);
}
}
Copy after login

对于符号,我们将查找符号并打印出我们发现的任何匹配项:

else if(is_prefix(command, "symbol")) {
auto syms = lookup_symbol(args[1]);
for (auto&& s : syms) {
std::cout << s.name << ' ' << to_string(s.type) << " 0x" << std::hex << s.addr << std::endl;
}
}
Copy after login
测试一下

在一个简单的二进制文件上启动调试器,并设置源代码级别的断点。在一些 foo 函数上设置一个断点,看到我的调试器停在它上面是我这个项目最有价值的时刻之一。

符号查找可以通过在程序中添加一些函数或全局变量并查找它们的名称来进行测试。请注意,如果你正在编译 C++ 代码,你还需要考虑名称重整。

本文就这些了。下一次我将展示如何向调试器添加堆栈展开支持。

你可以在这里找到这篇文章的代码。

The above is the detailed content of In-depth exploration of source code-level breakpoint technology in Linux debuggers!. For more information, please follow other related articles on the PHP Chinese website!

source:linuxprobe.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template