# Java判断两句话相似度的算法 以下是几种常用的Java实现方法来判断两句话的相似度: ## 1. 余弦相似度算法 ```java import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Set; public class CosineSimilarity { public static double cosineSimilarity(String text1, String text2) { // 分词(这里简单按空格分割,实际应用可能需要更复杂的分词) String[] words1 = text1.split("\\s+"); String[] words2 = text2.split("\\s+"); // 获取所有不重复的词汇 Set<String> allWords = new HashSet<>(); for (String word : words1) allWords.add(word); for (String word : words2) allWords.add(word); // 构建词频向量 Map<String, Integer> vector1 = new HashMap<>(); Map<String, Integer> vector2 = new HashMap<>(); for (String word : words1) { vector1.put(word, vector1.getOrDefault(word, 0) + 1); } for (String word : words2) { vector2.put(word, vector2.getOrDefault(word, 0) + 1); } // 计算点积 double dotProduct = 0; for (String word : allWords) { dotProduct += vector1.getOrDefault(word, 0) * vector2.getOrDefault(word, 0); } // 计算向量长度 double magnitude1 = 0; for (int count : vector1.values()) { magnitude1 += Math.pow(count, 2); } magnitude1 = Math.sqrt(magnitude1); double magnitude2 = 0; for (int count : vector2.values()) { magnitude2 += Math.pow(count, 2); } magnitude2 = Math.sqrt(magnitude2); // 计算余弦相似度 return dotProduct / (magnitude1 * magnitude2); } public static void main(String[] args) { String text1 = "我喜欢编程"; String text2 = "编程使我快乐"; System.out.println("相似度: " + cosineSimilarity(text1, text2)); } } ``` ## 2. Levenshtein距离(编辑距离) ```java public class LevenshteinDistance { public static int calculate(String s1, String s2) { int[][] dp = new int[s1.length() + 1][s2.length() + 1]; for (int i = 0; i <= s1.length(); i++) { dp[i][0] = i; } for (int j = 0; j <= s2.length(); j++) { dp[0][j] = j; } for (int i = 1; i <= s1.length(); i++) { for (int j = 1; j <= s2.length(); j++) { int cost = (s1.charAt(i - 1) == s2.charAt(j - 1)) ? 0 : 1; dp[i][j] = Math.min( Math.min(dp[i - 1][j] + 1, dp[i][j - 1] + 1), dp[i - 1][j - 1] + cost ); } } return dp[s1.length()][s2.length()]; } public static double similarity(String s1, String s2) { int distance = calculate(s1, s2); return 1 - (double) distance / Math.max(s1.length(), s2.length()); } public static void main(String[] args) { String s1 = "kitten"; String s2 = "sitting"; System.out.println("相似度: " + similarity(s1, s2)); } } ``` ## 3. Jaccard相似系数 ```java import java.util.Arrays; import java.util.HashSet; import java.util.Set; public class JaccardSimilarity { public static double calculate(String text1, String text2) { Set<String> set1 = new HashSet<>(Arrays.asList(text1.split("\\s+"))); Set<String> set2 = new HashSet<>(Arrays.asList(text2.split("\\s+"))); Set<String> intersection = new HashSet<>(set1); intersection.retainAll(set2); Set<String> union = new HashSet<>(set1); union.addAll(set2); return (double) intersection.size() / union.size(); } public static void main(String[] args) { String text1 = "我喜欢编程"; String text2 = "编程使我快乐"; System.out.println("相似度: " + calculate(text1, text2)); } } ``` ## 4. 使用第三方库(Apache Commons Text) ```java import org.apache.commons.text.similarity.*; public class SimilarityExample { public static void main(String[] args) { // 余弦相似度 CosineSimilarity cosine = new CosineSimilarity(); Double cosineScore = cosine.cosineSimilarity("我喜欢编程", "编程使我快乐"); // Jaccard相似度 JaccardSimilarity jaccard = new JaccardSimilarity(); Double jaccardScore = jaccard.apply("我喜欢编程", "编程使我快乐"); // Levenshtein距离 LevenshteinDistance levenshtein = new LevenshteinDistance(); Integer distance = levenshtein.apply("kitten", "sitting"); Double levenshteinScore = 1 - (double)distance / Math.max("kitten".length(), "sitting".length()); System.out.println("余弦相似度: " + cosineScore); System.out.println("Jaccard相似度: " + jaccardScore); System.out.println("Levenshtein相似度: " + levenshteinScore); } } ``` ## 选择建议 1. 对于短文本比较,Levenshtein距离效果较好 2. 对于长文档比较,余弦相似度更合适 3. Jaccard系数计算简单,适合快速比较 4. 实际应用中可能需要结合多种算法 这些算法都可以根据具体需求进行调整和优化,例如添加同义词处理、词干提取等预处理步骤。 [2025-05-22 14:02:47 | AI问答 | 1383点数解答]