Why is the list "spans" never updated? I don't understand why the code gets stuck in an infinite loop.
pdf: https://www.sil.org/system/files/reapdata/62/99/18/62991811720566250411942290005522370655/40337_02.pdf
"Block" example: https://jumpshare.com/s/y393jobqjfiye51gkexn
import fitz doc = fitz.open("cubeo/40337_02.pdf") page = doc[3] blocks = page.get_text("dict", flags = fitz.TEXTFLAGS_TEXT)["blocks"] for block in blocks: entries = [] if len(block["lines"]) > 3: # ignora legendas e número de página for line in block["lines"]: spans = [] for span in line["spans"]: spans.append({"text": span["text"].replace("�", " "), "size": int(span["size"]), "font": span["font"]}) # While there are spans left while True: # Delimits where an entry starts entry_first_position = None for i, span in enumerate(spans): if span["font"] == "Sb&cuSILCharis-Bold": entry_first_position = i break if entry_first_position is not None: # Delimits where an entry ends entry_last_position = None for i, span in enumerate(spans[entry_first_position:], start=entry_first_position): if span["font"] == "Sb&cuSILCharis-Bold": entry_last_position = i break if entry_last_position is not None: # Whole entry is added as a list append_list = spans[entry_first_position:entry_last_position] entries.append(append_list) spans = spans[:entry_first_position] + spans[entry_last_position:] else: break else: break print(spans)
What I expect is that print(spans) outputs "[]". However, the code never reaches this point.
for i, span in enumerate(spans[entry_first_position:], start=entry_first_position):
Will not skip the first occurrence of span["font"] == "sb&cusilcharis-bold"
. So entry_last_position == entry_first_position
, nothing is deleted and you're stuck in an infinite loop. Change it to
for i, span in enumerate(spans[entry_first_position+1:], start=entry_first_position+1):
So it starts looking at the next position in the list
The above is the detailed content of Error updating list while looping in Python. For more information, please follow other related articles on the PHP Chinese website!