The difference in data distributions becomes very clear when the training and testing data come from different corpora, causing a large performance gap between validation and testing performance. To solve this problem, a cross-corpus speech emotion recognition method based on adversarial training is proposed. The proposed method can effectively eliminate the differences between different corpora with the adversarial training of corpora, and improve the extracting ability of domain-invariant emotion features. At the same time, model the relative dependence of different position elements in the speech sequence to enhance the emotion-salient features extracting ability of the sequence by introducing the multi-head attention mechanism. When the experiment applies IEMOCAP as the source domain and MSP-IMPRO as the target domain, the results are superior to the benchmark methods about 0.91%~12.22%. Meanwhile, the experiment applies MSP-IMPRO as the source domain and IEMOCAP as the target domain, the results also achieve better performance than the benchmark methods about 2.27%~6.90%. Therefore, in the case of the absence of emotion labels of the target domain, the proposed cross-corpus speech emotion recognition method is more beneficial to extracting domain-invariant emotion salient features.