References
Simonyan K.,Zisserman A. :Two-stream convolutional networks for action
recognition in videos. Proceedings of the 27th International
Conference on Neural Information Processing Systems. 568-576
(2014).https://dl.acm.org/doi/10.5555/2968826.2968890
2 Feichtenhofer C., Pinz A., Wildes R P.: Spatiotemporal multiplier
networks for video action recognition. 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) . 4768-4777
(2017).https://doi.org/10.1109/CVPR.2017.787
3 Wang L., Xiong Y., Wang Z., Wang Z.,Qiao Y.,Lin D.,Tang X.,Gool L.
Temporal segment networks: Towards good practices for deep action
recognition. Proceedings of the 14th European Conference on
Computer Vision. Amsterdam: Springer, 2016. 20-36
(2016).https://doi.org/10.48550/arXiv.1608.00859
- Tran D., Bourdev L., Fergus R., Torresani L.,Paluri M.: Learning
spatiotemporal features with 3d convolutional networks.2015 IEEE
International Conference on Computer Vision (ICCV). 4489-4497
(2015).https://doi.org/10.1109/ICCV.2015.510
- Tran D., Ray J., Shou Z.,Chang S.,Paluri M.: Convnet architecture
search for spatiotemporal feature learning [EB/OL]. The IEEE
International Conference on Computer Vision and Pattern
Recognition .1-12(2017).https://doi.org/10.48550/arXiv.1708. 05038
- Diba A., Fayyaz M., Sharma V., Karami A.,Arzani M., Yousefzadeh
R.,Gool L. Temporal 3d convnets: New architecture and transfer
learning for video classification [EB/OL]. The IEEE
International Conference on Computer Vision and Pattern
Recognition .1-9(2017).https://doi.org/10.48550/ arXiv. 1711. 08200
- Girdhar R., Carreira J., Doersch C., Zisserman A.: Video action
transformer network. The IEEE International Conference on
Computer Vision and Pattern Recognition . 244-253 (2019).
https://doi.org/10.48550/arXiv.1812.02707
- Bertasius G., Wang H., Torresani L.: Is space-time attention all you
need for video understanding. The IEEE International Conference
on Computer Vision and Pattern Recognition . 2(3)-4 (2021).
https://doi.org/10.48550/arXiv.2102.05095
- Soomro K., Zamir A R., Shah M. Ucf101: A dataset of 101 human actions
classes from videos in the wild[EB/OL]. The IEEE
International Conference on Computer Vision and Pattern Recognition .
1-7(2012).https://doi.org/10.48550/arXiv. 1212.0402
- Kuehne H., Jhuang H., Garrote E.,Poggio T., Serre T. Hmdb: A large
video database for human motion recognition. 2011 International
Conference on Computer Vision .2556-2563(2021).
https://doi.org/10.1109/ICCV.2011.6126543
- Carreira J., Zisserman A. : Quo vadis, action recognition? a new model
and the kinetics dataset. 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) .6299-6308
(2017).https://doi.org/10.1109/CVPR.2017.502
- Goyal R., Ebrahimi Kahou S., Michalski V., Materzyńska J.,Westphal
S.,Kim H.,Haenel V.,Fruend I.,Yianilos P.,Mueller-Freitag M.,Hoppe F.,
Thurau C.,Bax I.,Memisevic R.: The ”something something” video
database for learning and evaluating visual common sense.2017
IEEE International Conference on Computer Vision (ICCV). 5842-5850
(2017).https://doi.org/ 10.48550/arXiv.1706.04261
- Lin J., Gan C., Han S. Tsm: Temporal shift module for efficient video
understanding. 2019 IEEE/CVF International Conference on
Computer Vision (ICCV) .7083-7093(2019).https://10.1109/ICCV.
2019.00718
- Li Y., Ji B., Shi X., Zhang J.,Kang B.,Wang L.: Tea: Temporal
excitation and aggregation for action recognition.2020 IEEE/CVF
Conference on Computer Vision and Pattern Recognition
(CVPR) .909-918(2019).https://10.1109/cvpr42600.2020.00099