UPC is designed to improve user productivity when programming distributed-memory machines. Yet the shared-memory abstraction also makes performance analysis hard as it introduces extra overhead with local accesses and implicit communication with remote ones. As far as we know, there are no mature software utilities for systematic analysis and tuning of shared-memory access performance in UPC programs. We develop a mechanism to track shared memory accesses and correlate them to the UPC source lines, functions, and data structures. We then apply tool-assisted analysis to a set of UPC programs. For the NAS UPC benchmark we achieve dramatic performance improvement over the unoptimized implementation as well as up to two times speedups over the fully hand-tuned implementation. We expect our approach effective in tuning a wide range of UPC programs.