Buy Me a Coffee☕
(1) MNIST(Modified National Institute of Standards and Technology)(1998):
- has the 70,000 handwritten digits[0~9] by 28x28 pixels each. *60,000 for train and 10,000 for test.
- is MNIST() in PyTorch.
(2) EMNIST(Extended MNIST)(2017):
- has the handwritten characters(digits[0~9] and alphabet letters[A~Z][a~z]) by 28x28 pixels each, splitted into 6 datasets(ByClass, ByMerge, Balanced, Letters, Digits and MNIST):
*Memos:
-
ByClass has 814,255 characters(digits[0~9] and alphabet letters[A~Z][a~z]). *697,932 for train and 116,323 for test.
-
ByMerge has 814,255 characters(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]). *697,932 for train and 116,323 for test.
-
Balanced has 131,600 characters(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]). *112,800 for train and 18,800 for test.
-
Letters has 145,600 alphabet letters[a~z]. *124,800 for train and 20,800 for test.
-
Digits has 280,000 digits[0~9]. *240,000 for train and 40,000 for test.
-
MNIST has 70,000 digits[0~9]. *60,000 for train and 10,000 for test.
- is EMNIST() in PyTorch.
(3) QMNIST(2019):
- has 120,000 handwritten digits[0~9] by 28x28 pixels each. *60,000 for train and 60,000 for test.
- is an extended MNIST. *I don't know what Q of QMNIST means.
- is QMNIST() in PyTorch.
(4) ETLCDB(Extract-Transform-Load Character Database)(2011):
- has the handwritten or machine-printed numerals, symbols, alphabet letters and Japanese characters split into 9 datasets(ETL-1, ETL-2, ETL-3 , ETL-4, ETL-5, ETL-6, ETL-7, ETL-8 and ETL-9) : : : : : : : : : : : : : : : : : : : : : :
*Memos:
-
ETL1 has 141,319 characters (digits[0~9], alphabet letters[A~Z], symbols[-*/=()・,?'] and Katakana[ア~ン]).
-
ETL2 has 52,796 characters(digits[0~9], alphabet letters[A~Z], symbols, Katakana letters[ア~ン], Hiragana letters[あ~ん] and Kanji letters).
-
ETL3 has 9,600 characters(digits[0~9], alphabet letters[A~Z] and symbols[¥ -*/=()・,_▾]).
-
ETL4 has 6,120 letters[あ~ん].
-
ETL5 has 10,608 Katakana letters[ア~ン].
-
ETL6 has 52,796 characters (digits[0~9], alphabet letters[A~Z][a~z], symbols and Katakana letters[ア~ン]).
-
ETL7(ETL7L and ETL7S) has 16,800 characters
- ETL8(ETL8G and ETL8B2) has 152,960 characters
ETL9(ETL9G and ETL9B)- has 607,200 characters
It's not in PyTorch so we need to download it from etlcdb.
-
(5) Kuzushiji(2018):
The cursive style of Japanese characters is split into 3 datasets(
Kuzushiji-MNIST
, - Kuzushiji-49 and Kuzushiji-Kanji):
*Memos:
Kuzushiji-MNIST
has 28x28 pixels resolution
-
Kuzushiji-49 has 28x28 pixels each.
Kuzushiji-49-
Kuzushiji-Kanji
has the imbalanced 140,424 Kanji characters by 64x64 pixels each.-
KMNIST() is in PyTorch but it only has
Kuzushiji-MNIST 🎜>
-
(6) Moving MNIST(2015):
has 10,000 videos by 64x64 pixels each. *Each video has 20 frames with 2 moving digits.
MovingMNIST() is in PyTorch.
The above is the detailed content of Datasets for Computer Vision (1). For more information, please follow other related articles on the PHP Chinese website!