在Cheerio中如何提取由不同HTML標籤分隔的文本

Question

我正在嘗試將下面的特定文字字串作為單獨的輸出進行提取，例如（從下面的HTML中抓取它們）：lettext="這是我需要的第一個文字";lettext2="這是我需要的第二個文字";lettext3="這是我需要的第三個文字";我真的不知道如何獲得由不同的HTML標籤分隔的文字。 Count:31Something:這是我需要的第一個文字S

P粉198670603 · Answer

嘗試像這樣做，看看是否有效：

html = `your sample html above`

domdoc = new DOMParser().parseFromString(html, "text/html")
result = domdoc.evaluate('//text()[not(ancestor::span)]', domdoc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

for (let i = 0; i < result.snapshotLength; i++) {
  target = result.snapshotItem(i).textContent.trim()
  if (target.length > 0) {
    console.log(target);
  }
}

使用您的範例html，輸出應為：

"That's the first text I need"
"The second text I need"
"The third text I need"

P粉386318086 · Answer

您可以迭代

的子節點，並取得任何非空白內容的 nodeType === Node.TEXT_NODE：

for (const e of document.querySelector("p").childNodes) {
  if (e.nodeType === Node.TEXT_NODE && e.textContent.trim()) {
    console.log(e.textContent.trim());
  }
}

// 或者创建一个数组：
const result = [...document.querySelector("p").childNodes]
  .filter(e =>
    e.nodeType === Node.TEXT_NODE && e.textContent.trim()
  )
  .map(e => e.textContent.trim());
console.log(result);

Count: 31
Something: That's the first text I need Something2: The second text I need
Something3: The third text I need