given: I have a source table that contains different attributes for footwear in multiple boolean fields that indicate whether the footwear for that attribute is available. 1=Available, 0=Not available. Sample data is as follows -
Prod_id | s_7 | s_8 | s_9 | s_10 | c_white | c_black | c_blue | c_brown | c_other | t_shoes | t_Sandals | t_slippers | ... |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
001 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | ... |
002 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | ... |
003 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | ... |
004 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | ... |
005 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | ... |
006 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | ... |
007 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... |
008 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | ... |
009 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
There are also some attribute columns, such as o_casual, o_formal, o_ethnic, m_canvas, m_leather, m_silicon, p_plain, p_textured, p_funky, etc., and all attribute columns have binary values of their respective prod_id. There are over 50k prod_ids.
ask: I need a rotated distribution of the availability of each pair of attributes, and the cell value should indicate the count of prod_id for a given pair at the intersection. For example, a prod_id with attributes "s_8" and "c_black" has a count of 4 (where s_8=1 and c_black=1). It should appear in the view below, which can be useful for further analysis.
Attributes | s_7 | s_8 | s_9 | s_10 | c_white | c_black | c_blue | c_brown | c_other | t_shoes | t_Sandals | t_slippers | ... |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s_7 | 3 | 2 | 2 | 0 | 2 | 1 | 3 | 0 | 1 | 0 | 1 | 2 | ... |
s_8 | 2 | 6 | 4 | 2 | 2 | 4 | 3 | 3 | 4 | 3 | 1 | 2 | ... |
s_9 | 2 | 4 | 7 | 3 | 2 | 5 | 4 | 3 | 4 | 3 | 2 | 3 | ... |
s_10 | 0 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
c_white | 2 | ... | ... | ||||||||||
c_black | 1 | ... | ... | ||||||||||
c_blue | 3 | ... | ... | ||||||||||
c_Brown | 0 | ... | ... | ||||||||||
c_other | 1 | ... | ... | ||||||||||
t_shoes | 0 | ... | ... | ||||||||||
t_Sandals | 1 | ... | ... | ||||||||||
t_slippers | 2 | ... | ... | ||||||||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Intuitively, the output table is mirrored across the diagonal, and the diagonal cells will contain the total number of prod_ids for that particular attribute.
I am new to SQL. Please help me find a logic through which I can achieve the desired pivot view. Note that the data is large, so consider scalable solutions. If the question is unclear and you need some explanation to clarify, please let me know.
You can use a query like -
Easily built in the language of your choice. Here is a simple example using PHP -