Different from major classification methods based on large amounts of annotation data. we introduce a cross-modal alignment for zero-shot image classification. The key is utilizing the query of text attribute learned from the seen classes to guide local feature responses in unseen classes. First. https://www.itsmajorlook.com/