So, here's the situation: I have a CSV file that looks like this:
show_id title cast 1 Batman Robert Pattinson, Collin Farrel, Zoë Kravitz 2 Twilight Robert Pattinson, Kristen Stewart 3 Ava Jessica Chastain, Collin Farrel
What I need to do is open this CSV file in a Python function and do some processing on spaces etc.
Then I need to upload it into a SQL database (anything I want, but I chose MySQL), which is no problem.
question
My main problem is that I then need (as my challenge requires it) to create a query that counts the number of times an actor appears in all movies in a list. So, in this case, the query should look like this:
演员 电影数量 Robert Pattinson 2 Collin Farrel 2 Zoë Kravitz 1 Kristen Stewart 1 Jessica Chastain 1
As you can see, I don't have a name that I can search with LIKE or CONTAINS. So, what should I do? Because in the CSV, the cast list for each movie has more than one actor, and I'm saving them in a varchar or text type in the database, I have multiple actors per row.
Or should I create another table with a foreign key for actors and movies? Or is it not possible to do this in MySQL, but maybe possible in other SQLs?
If you are looking for something performance efficient, you should split the data (create 3 tables in total - movies, actors, credits) and use credits to connect the actors with the movies and then write something with the join Simple SQL, as shown below:
You can also do it another way using https://sebhastian.com/mysql-split-string/ or using psql/plsql and stored procedures. I would just split the data if possible.
cheers.