I'm trying to extract table data from a few thousand html files or site data, but the tables don't have divs to make this easy, and I'm new to beautiful soup. Right now I'm manually editing all the converted html to csv and putting them into my database to create the tables, but I'd rather just grab what I already have.
< <body style="margin-top:140px;"> <div id="container"> <!-- Left div --> <div> </div> <!-- Center div --> <div> <!-- Image Link --> <a href="http://www.website.com"><img src="http://website.com/wp-content/uploads/2016/12/Blue-Transparent.png" style = "max-width:100%; max-height:120px;" alt="Center Banner"></a> </div> <!-- Right div --> <div> </div> </div> <A Name = "Top"></A> <H1>5k Run</H1> <H1>Overall Finish List</H1> <H2>September 24, 2022</H2> <HR noshade> <B><I> </I></B> <HR noshade> <table border=0 cellpadding=0 cellspacing=0 class="racetable"> <tr> <td class=h01 colspan="9"><H2>1st Alarm 5k</H2></td> </tr> <tr> <td class=h11>Place</td> <td class=h12>Name</td> <td class=h12>City</td> <td class=h11>Bib No</td> <td class=h11>Age</td> <td class=h11>Gender</td> <td class=h11>Age Group</td> <td class=h11>Total Time</td> <td class=h11>Pace</td> </tr> <tr> <td class=d01>1</td> <td class=d02>Runner 1</td> <td class=d02>ANYTOWN PA</td> <td class=d01>390</td> <td class=d01>52</td> <td class=d01>M</td> <td class=d01>1:Overall</td> <td class=d01> 18:43.93</td> <td class=d01>6:03/M</td> </tr> <tr> <td class=d01>2</td> <td class=d02>Runner 2</td> <td class=d02>ANYTOWN PA</td> <td class=d01>380</td> <td class=d01>33</td> <td class=d01>M</td> <td class=d01>1:19-39</td> <td class=d01> 19:31.27</td> <td class=d01>6:18/M</td> </tr> <tr> <td class=d01>3</td> <td class=d02>Runner 3</td> <td class=d02>ANYTOWN PA</td> <td class=d01>389</td> <td class=d01>65</td> <td class=d01>F</td> <td class=d01>1:Overall</td> <td class=d01> 45:45.20</td> <td class=d01>14:46/M</td> </tr> <tr> <td class=d01>4</td> <td class=d02>Runner 4</td> <td class=d02>ANYTOWN PA</td> <td class=d01>381</td> <td class=d01>18</td> <td class=d01>F</td> <td class=d01>1: 1-18</td> <td class=d01> 53:28.84</td> <td class=d01>17:15/M</td> </tr> <tr> <td class=d01>5</td> <td class=d02>Runner 5</td> <td class=d02>ANYTOWN PA</td> <td class=d01>382</td> <td class=d01>41</td> <td class=d01>F</td> <td class=d01>1:40-59</td> <td class=d01> 53:30.48</td> <td class=d01>17:16/M</td> </tr> <tr> <td class=d01>6</td> <td class=d02>Runner 6</td> <td class=d02>ANYTOWN PA</td> <td class=d01>384</td> <td class=d01>14</td> <td class=d01>M</td> <td class=d01>1: 1-18</td> <td class=d01> 57:38.66</td> <td class=d01>18:36/M</td> </tr> <tr> <td class=d01>7</td> <td class=d02>Runner 7</td> <td class=d02>ANYTOWN PA</td> <td class=d01>385</td> <td class=d01>72</td> <td class=d01>F</td> <td class=d01>1:60-99</td> <td class=d01> 57:40.11</td> <td class=d01>18:36/M</td> </tr> </table> <HR noshade> <p> <!-- 0c17 22.0 2e9 --> </BODY> </HTML> >
I've tried adding divs without much success.
BeautifulSoup allows you to search for content outside of divs.
Assuming the html you are displaying wants to retrieve something that looks like a runner, you could do something like this.
The printed result looks like this