SCNN: An accelerator for compressed-sparse convolutional neural networks A Parashar, M Rhu, A Mukkara, A Puglielli, R Venkatesan, B Khailany, ... ACM SIGARCH computer architecture news 45 (2), 27-40, 2017 | 923 | 2017 |
GPUs and the future of parallel computing SW Keckler, WJ Dally, B Khailany, M Garland, D Glasco IEEE micro 31 (5), 7-17, 2011 | 728 | 2011 |
Imagine: Media processing with streams B Khailany, WJ Dally, UJ Kapasi, P Mattson, J Namkoong, JD Owens, ... IEEE micro 21 (2), 35-46, 2001 | 485 | 2001 |
Programmable stream processors UJ Kapasi, S Rixner, WJ Dally, B Khailany, JH Ahn, P Mattson, JD Owens Computer 36 (8), 54-62, 2003 | 439 | 2003 |
Register organization for media processing S Rixner, WJ Dally, B Khailany, P Mattson, UJ Kapasi, JD Owens Proceedings Sixth International Symposium on High-Performance Computer …, 2000 | 401 | 2000 |
The Imagine stream processor UJ Kapasi, WJ Dally, S Rixner, JD Owens, B Khailany Proceedings. IEEE International Conference on Computer Design: VLSI in …, 2002 | 357 | 2002 |
A bandwidth-efficient architecture for media processing S Rixner, WJ Dally, UJ Kapasi, B Khailany, A Lopez-Lagunas, PR Mattson, ... Proceedings. 31st Annual ACM/IEEE International Symposium on …, 1998 | 355 | 1998 |
CudaDMA: optimizing GPU memory bandwidth via warp specialization M Bauer, H Cook, B Khailany Proceedings of 2011 international conference for high performance computing …, 2011 | 193 | 2011 |
Evaluating the imagine stream architecture JH Ahn, WJ Dally, B Khailany, UJ Kapasi, A Das ACM SIGARCH Computer Architecture News 32 (2), 14, 2004 | 188 | 2004 |
Timeloop: A systematic approach to dnn accelerator evaluation A Parashar, P Raina, YS Shao, YH Chen, VA Ying, A Mukkara, ... 2019 IEEE international symposium on performance analysis of systems and …, 2019 | 166 | 2019 |
Simba: Scaling deep-learning inference with multi-chip-module-based architecture YS Shao, J Clemons, R Venkatesan, B Zimmer, M Fojtik, N Jiang, B Keller, ... Proceedings of the 52nd Annual IEEE/ACM International Symposium on …, 2019 | 155 | 2019 |
Unifying primary cache, scratch, and register file memories in a throughput processor M Gebhart, SW Keckler, B Khailany, R Krashinsky, WJ Dally 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 96-106, 2012 | 146 | 2012 |
A programmable 512 GOPS stream processor for signal, image, and video processing BK Khailany, T Williams, J Lin, EP Long, M Rygh, DFW Tovey, WJ Dally IEEE Journal of solid-state circuits 43 (1), 202-213, 2008 | 140 | 2008 |
Efficient conditional operations for data-parallel architectures UJ Kapasi, WJ Dally, S Rixner, PR Mattson, JD Owens, B Khailany Proceedings of the 33rd annual ACM/IEEE International Symposium on …, 2000 | 137 | 2000 |
Stream Processors: Progammability and Efficiency: Will this new kid on the block muscle out ASIC and DSP? WJ Dally, UJ Kapasi, B Khailany, JH Ahn, A Das Queue 2 (1), 52-62, 2004 | 105 | 2004 |
Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement Y Lin, Z Jiang, J Gu, W Li, S Dhar, H Ren, B Khailany, DZ Pan IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2020 | 93 | 2020 |
The VLSI implementation and evaluation of area-and energy-efficient streaming media processors BK Khailany stanford university, 2003 | 75 | 2003 |
Exploring the VLSI scalability of stream processors B Khailany, WJ Dally, S Rixner, UJ Kapasi, JD Owens, B Towles The Ninth International Symposium on High-Performance Computer Architecture …, 2003 | 73 | 2003 |
Magnet: A modular accelerator generator for neural networks R Venkatesan, YS Shao, M Wang, J Clemons, S Dai, M Fojtik, B Keller, ... 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1-8, 2019 | 61 | 2019 |
High performance graph convolutional networks with applications in testability analysis Y Ma, H Ren, B Khailany, H Sikka, L Luo, K Natarajan, B Yu Proceedings of the 56th Annual Design Automation Conference 2019, 1-6, 2019 | 61 | 2019 |