Java Set 完全指南

"Set 是去重的艺术" —— 从哈希表到红黑树，掌握Set就掌握了唯一性约束的核心

Set 是 Java 集合框架的核心接口，用于存储唯一的元素集合。想象一个身份证验证系统：每个人只能出现一次，不允许重复，这就是Set的精髓。

🎯 快速索引

基础概念 - 5分钟理解Set核心特性
实现类对比 - 选择最适合的Set实现
HashSet深度解析 - 面试必问的底层原理
TreeSet排序机制 - 自动排序的实现
实战应用 - 真实项目中的使用模式
性能优化指南 - 提升应用性能的技巧
高频面试题 - 拿下offer的关键点

🔍 什么是 Set 集合？

Set 核心特性

Set 是 Java 集合框架的核心接口，继承自 Collection，表示不包含重复元素的集合：

特性	说明	实际场景
🔐 元素唯一性	自动去重，不允许重复	用户ID去重、标签管理
⚡ 高效查找	O(1)或O(log n)查找	快速存在性检查、权限验证
🔄 无序性（大部分实现）	不保持插入顺序	统计词频、数学集合运算
🎯 基于优化结构	哈希表或红黑树	高性能数据验证
🧰 null支持	大部分允许一个null	可选配置处理

Set vs List 对比

维度	Set	List
元素重复	❌ 不允许	✅ 允许
顺序保证	大部分无序（除LinkedHashSet、TreeSet）	✅ 保持插入顺序
索引访问	❌ 不支持	✅ 支持随机访问
查找性能	⚡ O(1)或O(log n)	🐌 ArrayList O(n)，LinkedList O(n)
适用场景	去重、集合运算、快速查找	顺序数据、索引访问、允许重复

🏗️ 常见 Set 实现对比

性能特征可视化

实现类详细对比表

实现类	底层结构	时间复杂度	空间复杂度	线程安全	特殊功能	推荐指数
HashSet ⭐	HashMap	O(1)	O(n)	❌	基于哈希表	⭐⭐⭐⭐⭐
LinkedHashSet 🔗	HashMap+双向链表	O(1)	O(n)	❌	保持插入顺序	⭐⭐⭐⭐
TreeSet 🌳	红黑树	O(log n)	O(n)	❌	自动排序、范围查询	⭐⭐⭐⭐
EnumSet 🎭	位向量	O(1)	O(1)	❌	枚举专属，性能极高	⭐⭐⭐
CopyOnWriteArraySet 📝	数组快照	O(1)	O(n)	✅	读多写少场景	⭐⭐⭐
ConcurrentSkipListSet 🚀	跳表	O(log n)	O(n)	✅	高并发有序Set	⭐⭐⭐

💡 选择建议：

90%场景用HashSet：需要最高性能，不关心顺序

需要插入顺序用LinkedHashSet：保持FIFO顺序

需要排序用TreeSet：自动排序，支持范围查询

枚举类型用EnumSet：极致性能优化

并发读多写少用CopyOnWriteArraySet

Set 应用场景矩阵

业务场景	推荐实现	性能特点	代码示例
用户ID去重	`HashSet`	快速查找去重	`Set<Long> userIds = new HashSet<>();`
标签管理	`HashSet`	高效存在性检查	`Set<String> tags = new HashSet<>();`
权限集合	`HashSet`	快速权限验证	`Set<Permission> permissions = new HashSet<>();`
时间窗口数据	`LinkedHashSet`	保持时间顺序	`Set<Event> timeWindow = new LinkedHashSet<>();`
排行榜	`TreeSet`	自动排序	`NavigableSet<Integer> rankings = new TreeSet<>();`
枚举配置	`EnumSet`	极致性能	`EnumSet<Status> statuses = EnumSet.allOf(Status.class);`
配置去重	`HashSet`	配置项唯一性	`Set<ConfigItem> uniqueConfigs = new HashSet<>();`

🧠 HashSet 深度解析

核心参数配置

public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, Serializable {
    // 默认初始容量
    private static final int DEFAULT_INITIAL_CAPACITY = 16;

    // 加载因子
    private static final float DEFAULT_LOAD_FACTOR = 0.75f;

    // 存储元素的HashMap
    private transient HashMap<E, Object> map;

    // 虚拟value对象，所有key都指向同一个对象
    private static final Object PRESENT = new Object();
}

去重机制实现

// HashSet 的 add 方法源码
public boolean add(E e) {
    // 实际上是调用 HashMap 的 put 方法
    return map.put(e, PRESENT) == null;
}

// HashMap 的 put 方法关键逻辑
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;

    // 1. 计算数组索引
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);  // 直接插入新节点
    else {
        Node<K,V> e; K k;
        // 2. 遍历链表/红黑树查找重复元素
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k)))) {
            e = p;
        }
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);  // 插入链表尾部
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k)))) {
                    break;  // 找到重复元素
                }
                p = e;
            }
        }

        // 3. 如果找到重复元素，更新value
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            e.value = value;
            return null;  // 返回null表示更新，而不是添加
        }
    }
}

去重流程可视化

容量与扩容机制

public class HashSetCapacityAnalysis {

    public static void demonstrateGrowth() {
        Set<String> set = new HashSet<>();

        System.out.println("=== HashSet 扩容过程 ===");

        // 初始状态
        printCapacity(set, 0);  // capacity = 16, size = 0

        // 添加元素直到触发扩容
        for (int i = 1; i <= 12; i++) {  // 第12个元素触发扩容：12/16 = 0.75
            set.add("item" + i);
        }
        printCapacity(set, 12); // capacity = 16, size = 12 (即将扩容)

        // 第13个元素触发扩容
        set.add("item13");
        printCapacity(set, 13); // capacity = 32, size = 13 (扩容后)

        // 继续添加直到下次扩容
        for (int i = 14; i <= 25; i++) {  // 第25个元素触发扩容：25/32 ≈ 0.78
            set.add("item" + i);
        }
        printCapacity(set, 25); // capacity = 32, size = 25 (即将扩容)

        // 第26个元素触发扩容
        set.add("item26");
        printCapacity(set, 26); // capacity = 64, size = 26 (扩容后)
    }

    private static void printCapacity(Set<?> set, int elementCount) {
        try {
            Field mapField = HashSet.class.getDeclaredField("map");
            mapField.setAccessible(true);
            HashMap<?, ?> map = (HashMap<?, ?>) mapField.get(set);

            Field tableField = HashMap.class.getDeclaredField("table");
            tableField.setAccessible(true);
            Object[] table = (Object[]) tableField.get(map);

            System.out.printf("元素数量: %d, 数组容量: %d, 负载因子: %.2f%n",
                elementCount, table.length, (double) elementCount / table.length);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

// 容量计算工具
public class SetCapacityOptimizer {

    // 根据预期元素数量计算最优初始容量
    public static int calculateOptimalCapacity(int expectedSize, float loadFactor) {
        return (int) (expectedSize / loadFactor) + 1;
    }

    // 创建优化容量的HashSet
    public static <T> HashSet<T> createOptimizedHashSet(int expectedSize) {
        int capacity = calculateOptimalCapacity(expectedSize, 0.75f);
        return new HashSet<>(capacity);
    }

    // 使用示例
    public static void example() {
        // 预期存储10000个元素
        Set<String> optimized = createOptimizedHashSet(10000);

        // 比默认容量减少扩容次数
        for (int i = 0; i < 10000; i++) {
            optimized.add("item" + i);
        }
    }
}

🌳 TreeSet 红黑树与排序机制

红黑树核心特性

TreeSet 构造函数与排序策略

public class TreeSetExamples {

    // 1. 自然排序 - 元素必须实现 Comparable 接口
    public void naturalOrdering() {
        Set<Integer> numbers = new TreeSet<>();
        numbers.add(5);
        numbers.add(2);
        numbers.add(8);
        numbers.add(1);
        // 输出: [1, 2, 5, 8] (自动升序)
    }

    // 2. 自定义排序 - 使用 Comparator
    public void customOrdering() {
        // 降序排列
        Set<Integer> descNumbers = new TreeSet<>((a, b) -> b.compareTo(a));
        descNumbers.add(5);
        descNumbers.add(2);
        descNumbers.add(8);
        descNumbers.add(1);
        // 输出: [8, 5, 2, 1] (降序)

        // 按字符串长度排序
        Set<String> lengthOrder = new TreeSet<>(Comparator.comparingInt(String::length));
        lengthOrder.add("apple");
        lengthOrder.add("banana");
        lengthOrder.add("pear");
        // 输出: [pear, apple, banana] (按长度)
    }

    // 3. 复杂对象排序
    public void objectOrdering() {
        class Person {
            String name;
            int age;
            Person(String name, int age) {
                this.name = name;
                this.age = age;
            }
        }

        // 按年龄排序
        Set<Person> people = new TreeSet<>(Comparator.comparingInt(p -> p.age));
        people.add(new Person("Alice", 25));
        people.add(new Person("Bob", 30));
        people.add(new Person("Charlie", 20));
        // 按年龄排序: Charlie(20), Alice(25), Bob(30)
    }
}

NavigableSet 高级操作

public class NavigableSetAdvancedOperations {

    public static void demonstrateNavigableSet() {
        NavigableSet<Integer> set = new TreeSet<>();

        // 添加数据
        set.addAll(Arrays.asList(10, 20, 30, 40, 50));

        System.out.println("原始集合: " + set);  // [10, 20, 30, 40, 50]

        // 1. 小于等于给定元素的最大值
        Integer floor = set.floor(25);  // 20
        System.out.println("floor(25) = " + floor);

        // 2. 严格小于给定元素的最大值
        Integer lower = set.lower(25);  // 20
        System.out.println("lower(25) = " + lower);

        // 3. 大于等于给定元素的最小值
        Integer ceiling = set.ceiling(25);  // 30
        System.out.println("ceiling(25) = " + ceiling);

        // 4. 严格大于给定元素的最小值
        Integer higher = set.higher(25);  // 30
        System.out.println("higher(25) = " + higher);

        // 5. 获取第一个元素
        Integer first = set.first();  // 10
        System.out.println("first() = " + first);

        // 6. 获取最后一个元素
        Integer last = set.last();  // 50
        System.out.println("last() = " + last);

        // 7. 获取子集
        NavigableSet<Integer> headSet = set.headSet(30, false);  // [10, 20]
        System.out.println("headSet(30, false) = " + headSet);

        NavigableSet<Integer> tailSet = set.tailSet(30, false);  // [40, 50]
        System.out.println("tailSet(30, false) = " + tailSet);

        // 8. 获取范围内的子集
        NavigableSet<Integer> subSet = set.subSet(20, true, 40, false);  // (20, 40)
        System.out.println("subSet(20, true, 40, false) = " + subSet);  // [20, 30]
    }

    // 实际应用：排行榜系统
    public static void leaderBoardExample() {
        NavigableSet<Integer> topScores = new TreeSet<>(Comparator.reverseOrder());

        // 添加分数
        topScores.addAll(Arrays.asList(85, 92, 78, 95, 88));
        System.out.println("分数排序: " + topScores);  // [95, 92, 88, 85, 78]

        // 获取前3名
        Integer first = topScores.first();  // 95分 (第1名)
        Integer second = topScores.higher(first);  // 92分 (第2名)
        Integer third = topScores.higher(second);  // 88分 (第3名)

        System.out.printf("前三名: 第1名=%d分, 第2名=%d分, 第3名=%d分%n",
                first, second, third);
    }
}

⚡ 性能优化指南

1. 初始容量优化

public class SetPerformanceOptimization {

    // ❌ 错误：频繁扩容导致性能下降
    public void inefficientGrowth() {
        Set<String> set = new HashSet<>();  // 默认容量16

        for (int i = 0; i < 100000; i++) {
            set.add("item" + i);
            // 会多次触发扩容：16→32→64→128→256→512→1024→2048→4096→8192→16384→32768→65536
        }
    }

    // ✅ 正确：预估容量避免扩容
    public void efficientGrowth() {
        // 预期100000个元素，负载因子0.75
        int optimalCapacity = (int) (100000 / 0.75) + 1;  // 133334
        Set<String> set = new HashSet<>(optimalCapacity);

        for (int i = 0; i < 100000; i++) {
            set.add("item" + i);
            // 只需要一次扩容（如果需要的话）
        }
    }

    // 性能测试对比
    public static void performanceComparison() {
        int elements = 100000;

        // 测试默认容量
        long start = System.nanoTime();
        Set<String> set1 = new HashSet<>();
        for (int i = 0; i < elements; i++) {
            set1.add("item" + i);
        }
        long time1 = System.nanoTime() - start;

        // 测试优化容量
        start = System.nanoTime();
        int capacity = (int) (elements / 0.75) + 1;
        Set<String> set2 = new HashSet<>(capacity);
        for (int i = 0; i < elements; i++) {
            set2.add("item" + i);
        }
        long time2 = System.nanoTime() - start;

        System.out.printf("默认容量耗时: %.2f ms%n", time1 / 1_000_000.0);
        System.out.printf("优化容量耗时: %.2f ms%n", time2 / 1_000_000.0);
        System.out.printf("性能提升: %.1fx%n", (double) time1 / time2);
    }
}

2. 批量操作优化

public class BatchSetOperations {

    // 批量添加优化
    public static void optimizedBatchAdd() {
        Set<String> targetSet = new HashSet<>();
        List<String> sourceList = Arrays.asList(
            "item1", "item2", "item3", "item4", "item5"
        );

        // ✅ 推荐：批量添加
        targetSet.addAll(sourceList);

        // ❌ 避免：逐个添加（虽然Set去重，但addAll通常更高效）
        // for (String item : sourceList) {
        //     targetSet.add(item);
        // }
    }

    // 大数据量分批处理
    public static void batchProcessing(List<String> largeData, int batchSize) {
        Set<String> uniqueItems = new HashSet<>();

        for (int i = 0; i < largeData.size(); i += batchSize) {
            int end = Math.min(i + batchSize, largeData.size());
            List<String> batch = largeData.subList(i, end);

            // 批量添加到Set中自动去重
            uniqueItems.addAll(batch);

            System.out.printf("处理批次 %d-%d, 当前唯一项数: %d%n",
                    i, end - 1, uniqueItems.size());
        }
    }

    // 集合运算优化
    public static void setOperationsOptimization() {
        Set<Integer> set1 = new HashSet<>(Arrays.asList(1, 2, 3, 4, 5));
        Set<Integer> set2 = new HashSet<>(Arrays.asList(4, 5, 6, 7, 8));

        // 并集 - 使用addAll
        Set<Integer> union = new HashSet<>(set1);
        union.addAll(set2);  // {1, 2, 3, 4, 5, 6, 7, 8}

        // 交集 - 使用retainAll
        Set<Integer> intersection = new HashSet<>(set1);
        intersection.retainAll(set2);  // {4, 5}

        // 差集 - 使用removeAll
        Set<Integer> difference = new HashSet<>(set1);
        difference.removeAll(set2);  // {1, 2, 3}

        // 对称差集
        Set<Integer> symDiff1 = new HashSet<>(set1);
        Set<Integer> symDiff2 = new HashSet<>(set2);
        symDiff1.removeAll(set2);  // {1, 2, 3}
        symDiff2.removeAll(set1);  // {6, 7, 8}
        symDiff1.addAll(symDiff2);  // {1, 2, 3, 6, 7, 8}
    }
}

3. 内存使用优化

public class SetMemoryOptimization {

    // 1. 及时清理不需要的Set
    public static void cleanupUnusedSets() {
        Map<String, Set<String>> cache = new HashMap<>();

        // 使用缓存
        Set<String> activeSet = cache.get("active");
        if (activeSet != null) {
            activeSet.add("newItem");
        }

        // 及时清理
        cache.remove("inactive");  // 清理不再使用的缓存项
    }

    // 2. 使用适当大小的数据类型
    public static void appropriateDataTypes() {
        // ❌ 大对象集合
        Set<User> userSet = new HashSet<>();

        // ✅ 使用基本类型集合（如果适用）
        // 需要引入第三方库如 fastutil
        // IntOpenHashSet userIds = new IntOpenHashSet();
    }

    // 3. 避免内存泄漏
    public static void avoidMemoryLeaks() {
        // ❌ 错误：静态Set长期持有引用
        // private static Set<Object> cache = new HashSet<>();

        // ✅ 正确：使用WeakReference或及时清理
        Set<Object> weakCache = Collections.newSetFromMap(new WeakHashMap<>());

        // 或者定期清理
        ScheduledExecutorService cleaner = Executors.newSingleThreadScheduledExecutor();
        cleaner.scheduleAtFixedRate(() -> {
            // 清理逻辑
        }, 1, TimeUnit.HOURS);
    }
}

🎯 Set 实战应用与最佳实践

1. 数据去重工具类

public class DeduplicationUtils {

    // 基于属性的列表去重
    public static <T> List<T> distinctByKey(List<T> list, Function<? super T, ?> keyExtractor) {
        Set<Object> seen = new HashSet<>();
        List<T> result = new ArrayList<>();

        for (T item : list) {
            Object key = keyExtractor.apply(item);
            if (seen.add(key)) {  // add返回true表示key未出现过
                result.add(item);
            }
        }

        return result;
    }

    // 使用示例
    public static void example() {
        class Person {
            String name;
            int age;
            Person(String name, int age) {
                this.name = name;
                this.age = age;
            }
        }

        List<Person> people = Arrays.asList(
            new Person("Alice", 25),
            new Person("Bob", 30),
            new Person("Alice", 26),  // 重复name
            new Person("Charlie", 25),
            new Person("Bob", 31)   // 重复name
        );

        // 按name去重，保留第一次出现的
        List<Person> uniqueByName = distinctByKey(people, Person::name);
        // 结果: Alice(25), Bob(30), Charlie(25)

        System.out.println("去重后的人数: " + uniqueByName.size());
    }

    // 多属性联合去重
    public static <T> List<T> distinctByKeys(List<T> list, Function<? super T, ?>... keyExtractors) {
        Set<List<Object>> seen = new HashSet<>();
        List<T> result = new ArrayList<>();

        for (T item : list) {
            List<Object> keys = Arrays.stream(keyExtractors)
                    .map(extractor -> extractor.apply(item))
                    .collect(Collectors.toList());

            if (seen.add(keys)) {
                result.add(item);
            }
        }

        return result;
    }
}

2. 权限管理系统

public class PermissionManager {

    private final Set<String> userPermissions;
    private final Set<String> rolePermissions;

    public PermissionManager() {
        // 用户权限
        this.userPermissions = new HashSet<>();
        userPermissions.add("READ");
        userPermissions.add("WRITE");
        userPermissions.add("DELETE");

        // 角色权限
        this.rolePermissions = new HashSet<>();
        rolePermissions.add("ADMIN_READ");
        rolePermissions.add("ADMIN_WRITE");
        rolePermissions.add("ADMIN_DELETE");
        rolePermissions.add("USER_READ");
        rolePermissions.add("USER_WRITE");
    }

    // 检查用户权限
    public boolean hasPermission(String permission) {
        return userPermissions.contains(permission);
    }

    // 检查角色权限
    public boolean hasRolePermission(String permission) {
        return rolePermissions.contains(permission);
    }

    // 批量权限检查
    public boolean hasAllPermissions(Set<String> requiredPermissions) {
        return userPermissions.containsAll(requiredPermissions);
    }

    public boolean hasAnyPermission(Set<String> permissions) {
        return permissions.stream().anyMatch(userPermissions::contains);
    }

    // 动态添加权限
    public void grantPermission(String permission) {
        userPermissions.add(permission);
    }

    public void revokePermission(String permission) {
        userPermissions.remove(permission);
    }

    // 权限组合示例
    public boolean canReadAndWrite() {
        Set<String> required = new HashSet<>(Arrays.asList("READ", "WRITE"));
        return hasAllPermissions(required);
    }
}

3. 缓存系统实现

public class SimpleCache<K, V> {

    private final Map<K, CacheEntry<K, V>> cache;
    private final int maxSize;

    public SimpleCache(int maxSize) {
        this.maxSize = maxSize;
        this.cache = new LinkedHashMap<K, CacheEntry<K, V>>(maxSize, 0.75f, true) {
            @Override
            protected boolean removeEldestEntry(Map.Entry<K, CacheEntry<K, V>> eldest) {
                return size() > maxSize;
            }
        };
    }

    public void put(K key, V value) {
        cache.put(key, new CacheEntry<>(value, System.currentTimeMillis()));
    }

    public V get(K key) {
        CacheEntry<K, V> entry = cache.get(key);
        if (entry != null && !entry.isExpired()) {
            return entry.getValue();
        }
        return null;
    }

    public Set<K> getKeySet() {
        return cache.keySet();
    }

    public void clear() {
        cache.clear();
    }

    // 缓存条目
    private static class CacheEntry<K, V> {
        private final V value;
        private final long timestamp;
        private final long ttl;  // 生存时间（毫秒）

        public CacheEntry(V value, long timestamp) {
            this(value, timestamp, 300000);  // 默认5分钟TTL
        }

        public CacheEntry(V value, long timestamp, long ttl) {
            this.value = value;
            this.timestamp = timestamp;
            this.ttl = ttl;
        }

        public V getValue() {
            return value;
        }

        public boolean isExpired() {
            return System.currentTimeMillis() - timestamp > ttl;
        }
    }

    // 使用示例
    public static void example() {
        SimpleCache<String, String> cache = new SimpleCache<>(1000);

        // 存储数据
        cache.put("user:1001", "Alice");
        cache.put("user:1002", "Bob");

        // 检索数据
        String user1 = cache.get("user:1001");  // "Alice"
        String user2 = cache.get("user:1002");  // "Bob"

        // 获取所有键（用于监控等）
        Set<String> allKeys = cache.getKeySet();
        System.out.println("缓存中的键: " + allKeys);
    }
}

4. 实时数据流处理

public class RealTimeDataProcessor {

    private final Set<String> uniqueEvents;
    private final Set<String> recentEvents;
    private final int windowSize;

    public RealTimeDataProcessor(int windowSize) {
        this.windowSize = windowSize;
        this.uniqueEvents = new HashSet<>();
        this.recentEvents = new LinkedHashSet<>();  // 保持时间顺序
    }

    // 处理新事件
    public synchronized void processEvent(String event) {
        // 检查是否为新事件
        if (uniqueEvents.add(event)) {
            System.out.println("新事件: " + event);

            // 添加到最近事件窗口
            recentEvents.add(event);

            // 维护窗口大小
            if (recentEvents.size() > windowSize) {
                Iterator<String> iterator = recentEvents.iterator();
                iterator.next();  // 移除最旧的事件
                iterator.remove();
            }
        } else {
            System.out.println("重复事件: " + event);
        }
    }

    // 获取统计信息
    public EventStatistics getStatistics() {
        return new EventStatistics(
            uniqueEvents.size(),
            recentEvents.size(),
            new ArrayList<>(recentEvents)
        );
    }

    // 统计信息类
    public static class EventStatistics {
        private final int totalUniqueEvents;
        private final int recentEventsCount;
        private final List<String> recentEventsList;

        public EventStatistics(int totalUnique, int recentCount, List<String> recentList) {
            this.totalUniqueEvents = totalUnique;
            this.recentEventsCount = recentCount;
            this.recentEventsList = recentList;
        }

        // getters...
    }
}

🎯 Set 高频面试题精讲

🥇 1. HashSet vs TreeSet vs LinkedHashSet 的区别？

public class SetComparison {

    public static void demonstrateDifference() {
        // 准备测试数据
        List<Integer> data = Arrays.asList(5, 2, 8, 1, 3);

        // HashSet - 无序，基于哈希表
        Set<Integer> hashSet = new HashSet<>(data);
        System.out.println("HashSet: " + hashSet);  // 顺序不确定，如 [1, 2, 3, 5, 8]

        // LinkedHashSet - 保持插入顺序
        Set<Integer> linkedHashSet = new LinkedHashSet<>(data);
        System.out.println("LinkedHashSet: " + linkedHashSet);  // [5, 2, 8, 1, 3] (保持插入顺序)

        // TreeSet - 自动排序
        Set<Integer> treeSet = new TreeSet<>(data);
        System.out.println("TreeSet: " + treeSet);  // [1, 2, 3, 5, 8] (自然排序)

        // 性能测试
        performanceTest(hashSet, "HashSet");
        performanceTest(linkedHashSet, "LinkedHashSet");
        performanceTest(treeSet, "TreeSet");
    }

    private static void performanceTest(Set<Integer> set, String name) {
        int testSize = 100000;
        Random random = new Random();

        // 添加测试
        long addStart = System.nanoTime();
        for (int i = 0; i < testSize; i++) {
            set.add(random.nextInt(testSize * 10));
        }
        long addTime = System.nanoTime() - addStart;

        // 查找测试
        long findStart = System.nanoTime();
        for (int i = 0; i < 1000; i++) {
            set.contains(random.nextInt(testSize * 10));
        }
        long findTime = System.nanoTime() - findStart;

        System.out.printf("%s - 添加耗时: %.2f ms, 查找耗时: %.2f ms%n",
                name, addTime / 1_000_000.0, findTime / 1_000_000.0);
    }
}

核心对比表：

特性	HashSet	LinkedHashSet	TreeSet
底层结构	HashMap	HashMap + 双向链表	红黑树
时间复杂度	O(1)	O(1)	O(log n)
顺序保证	无序	插入顺序	自然排序/自定义排序
null支持	允许1个null	允许1个null	不允许null
线程安全	否	否	否
内存占用	较低	中等	较高
适用场景	快速查找、无序去重	需要保持插入顺序	需要排序、范围查询

🥈 2. HashSet 的去重原理是什么？

public class HashSetDeduplicationMechanism {

    // hashCode() 和 equals() 的重要性演示
    public static void demonstrateDeduplication() {
        class BadKey {
            private int value;

            public BadKey(int value) {
                this.value = value;
            }

            // ❌ 错误：只重写equals，没有重写hashCode
            @Override
            public boolean equals(Object obj) {
                if (this == obj) return true;
                if (!(obj instanceof BadKey)) return false;
                return value == ((BadKey) obj).value;
            }
            // 没有重写hashCode()，使用Object的hashCode()
        }

        class GoodKey {
            private int value;

            public GoodKey(int value) {
                this.value = value;
            }

            // ✅ 正确：同时重写equals和hashCode
            @Override
            public boolean equals(Object obj) {
                if (this == obj) return true;
                if (!(obj instanceof GoodKey)) return false;
                return value == ((GoodKey) obj).value;
            }

            @Override
            public int hashCode() {
                return Objects.hash(value);
            }
        }

        Set<BadKey> badSet = new HashSet<>();
        Set<GoodKey> goodSet = new HashSet<>();

        // 添加相同的key对象
        badSet.add(new BadKey(1));
        badSet.add(new BadKey(1));  // 应该去重，但可能不会

        goodSet.add(new GoodKey(1));
        goodSet.add(new GoodKey(1));  // 正确去重

        System.out.println("BadKey Set 大小: " + badSet.size());  // 可能是2
        System.out.println("GoodKey Set 大小: " + goodSet.size());  // 确定是1
    }

    // hashCode() 重写原则
    public static class HashCodePrinciples {

        // ✅ 原则1：equals为true的对象，hashCode必须相同
        public static boolean principle1(Object a, Object b) {
            return a.equals(b) ? (a.hashCode() == b.hashCode()) : true;
        }

        // ✅ 原则2：equals为false的对象，hashCode可以相同（哈希碰撞）
        public static boolean principle2(Object a, Object b) {
            return !a.equals(b) || (a.hashCode() != b.hashCode());
        }

        // ✅ 原则3：hashCode()方法应该在多次执行时返回相同的值
        public static boolean principle3(Object obj, int times) {
            int firstHash = obj.hashCode();
            for (int i = 1; i < times; i++) {
                if (obj.hashCode() != firstHash) return false;
            }
            return true;
        }
    }
}

🥉 3. 如何在 Set 中存储自定义对象？

public class CustomObjectInSet {

    // 用户类示例
    public static class User {
        private final String username;      // 不可变字段
        private final String email;
        private final int age;

        public User(String username, String email, int age) {
            this.username = username;
            this.email = email;
            this.age = age;
        }

        // ✅ 基于.equals()：使用唯一标识字段
        @Override
        public boolean equals(Object obj) {
            if (this == obj) return true;
            if (!(obj instanceof User)) return false;
            User other = (User) obj;
            // 通常使用业务主键作为equals的依据
            return Objects.equals(username, other.username);
        }

        // ✅ 基于.hashCode()：使用相同字段计算
        @Override
        public int hashCode() {
            // 使用equals中使用的相同字段
            return Objects.hash(username);
        }

        // 可选：实现Comparable用于TreeSet排序
        @Override
        public int compareTo(User other) {
            return this.username.compareTo(other.username);
        }

        @Override
        public String toString() {
            return String.format("User{username='%s', email='%s', age=%d}",
                    username, email, age);
        }

        // getters...
    }

    // 使用示例
    public static void example() {
        Set<User> userSet = new HashSet<>();

        User user1 = new User("alice", "alice@example.com", 25);
        User user2 = new User("bob", "bob@example.com", 30);
        User user3 = new User("alice", "alice2@example.com", 26);  // 相同username

        userSet.add(user1);
        userSet.add(user2);
        userSet.add(user3);  // 不会添加，因为username重复

        System.out.println("用户集合大小: " + userSet.size());  // 2
        System.out.println("用户列表: " + userSet);
    }
}

🔥 4. TreeSet 的排序机制详解

public class TreeSetSortingMechanism {

    // 演示自然排序
    public static void naturalOrdering() {
        Set<String> words = new TreeSet<>();
        words.add("banana");
        words.add("apple");
        words.add("cherry");
        words.add("date");

        System.out.println("自然排序: " + words);  // [apple, banana, cherry, date]
    }

    // 演示自定义排序
    public static void customOrdering() {
        // 按字符串长度排序，长度相同按字母顺序
        Comparator<String> lengthAndAlpha = Comparator
                .comparingInt(String::length)
                .thenComparing(Comparator.naturalOrder());

        Set<String> words = new TreeSet<>(lengthAndAlpha);
        words.add("banana");  // 6个字符
        words.add("apple");   // 5个字符
        words.add("cherry");  // 6个字符
        words.add("date");    // 4个字符

        System.out.println("自定义排序: " + words);  // [date, apple, banana, cherry]
    }

    // 演示复杂对象排序
    public static void complexObjectSorting() {
        class Product {
            String name;
            double price;
            int sales;

            Product(String name, double price, int sales) {
                this.name = name;
                this.price = price;
                this.sales = sales;
            }

            @Override
            public String toString() {
                return String.format("%s(¥%.2f, 销量:%d)", name, price, sales);
            }
        }

        // 多级排序：先按销量降序，销量相同按价格升序
        Comparator<Product> salesThenPrice = Comparator
                .comparingInt((Product p) -> -p.sales)  // 销量降序
                .thenComparingDouble(Product::getPrice);  // 价格升序

        Set<Product> products = new TreeSet<>(salesThenPrice);
        products.add(new Product("手机", 2999.0, 1000));
        products.add(new Product("电脑", 5999.0, 500));
        products.add(new Product("平板", 1999.0, 800));
        products.add(new Product("耳机", 299.0, 2000));

        System.out.println("多级排序: " + products);
        // 输出：手机(¥2999.00, 销量:1000), 耳机(¥299.00, 销量:2000),
        //       电脑(¥5999.00, 销量:500), 平板(¥1999.00, 销量:800)
    }

    // 演示null处理
    public static void nullHandling() {
        // TreeSet不允许null，会抛出NullPointerException
        try {
            Set<String> set = new TreeSet<>();
            set.add("hello");
            set.add(null);  // ❌ 抛出异常
        } catch (NullPointerException e) {
            System.out.println("TreeSet不允许null: " + e.getMessage());
        }

        // 解决方案：使用自定义Comparator处理null
        Comparator<String> nullSafeComparator = Comparator
                .nullsFirst(Comparator.naturalOrder());

        Set<String> set = new TreeSet<>(nullSafeComparator);
        set.add("hello");
        set.add(null);  // ✅ null排在最前面
        System.out.println("null安全排序: " + set);  // [null, hello]
    }
}

⚡ 5. Set 的线程安全问题

public class SetConcurrencyIssues {

    // 演示并发问题
    public static void demonstrateConcurrencyIssues() {
        Set<Integer> unsafeSet = new HashSet<>();
        int threadCount = 10;
        int itemsPerThread = 1000;

        ExecutorService executor = Executors.newFixedThreadPool(threadCount);
        CountDownLatch latch = new CountDownLatch(threadCount);

        for (int i = 0; i < threadCount; i++) {
            final int threadId = i;
            executor.submit(() -> {
                for (int j = 0; j < itemsPerThread; j++) {
                    unsafeSet.add(threadId * 1000 + j);
                }
                latch.countDown();
            });
        }

        try {
            latch.await();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }

        executor.shutdown();

        // 期望：10000个元素
        // 实际：可能少于10000（数据丢失）
        System.out.println("期望元素数: " + (threadCount * itemsPerThread));
        System.out.println("实际元素数: " + unsafeSet.size());
    }

    // 线程安全解决方案
    public static void threadSafeSolutions() {
        // 方案1：Collections.synchronizedSet
        Set<Integer> synchronizedSet = Collections.synchronizedSet(new HashSet<>());

        // 方案2：ConcurrentHashMap.newKeySet() - 推荐
        Set<Integer> concurrentSet = ConcurrentHashMap.newKeySet();

        // 方案3：CopyOnWriteArraySet - 适用于读多写少
        Set<Integer> copyOnWriteSet = new CopyOnWriteArraySet<>();

        // 方案4：自定义并发Set
        Set<Integer> customConcurrentSet = new ConcurrentSkipListSet<>();

        // 使用示例
        safeSetOperation(synchronizedSet, "SynchronizedSet");
        safeSetOperation(concurrentSet, "ConcurrentHashMap.KeySet");
        safeSetOperation(copyOnWriteSet, "CopyOnWriteArraySet");
    }

    private static void safeSetOperation(Set<Integer> set, String name) {
        ExecutorService executor = Executors.newFixedThreadPool(10);

        // 多线程读写操作
        for (int i = 0; i < 10; i++) {
            final int threadId = i;
            executor.submit(() -> {
                // 写操作
                set.add(threadId * 1000);

                // 读操作
                boolean contains = set.contains(threadId * 1000 + 500);

                System.out.printf("%s - 线程%d: 写入%d, 查询%d结果: %b%n",
                        name, threadId, threadId * 1000, (threadId * 1000 + 500), contains);
            });
        }

        executor.shutdown();
        try {
            executor.awaitTermination(5, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
}

🎯 6. 大数据量下的 Set 优化

public class LargeDataSetOptimization {

    // 分批处理大数据
    public static void batchProcessLargeData(List<String> largeData) {
        int batchSize = 10000;
        Set<String> uniqueItems = new HashSet<>();

        for (int i = 0; i < largeData.size(); i += batchSize) {
            int end = Math.min(i + batchSize, largeData.size());
            List<String> batch = largeData.subList(i, end);

            // 批量去重
            uniqueItems.addAll(batch);

            System.out.printf("处理批次 %d-%d, 累计唯一项: %d%n",
                    i, end - 1, uniqueItems.size());

            // 可以在这里对批处理结果进行后续操作
            processBatchResult(uniqueItems, i, end);
        }
    }

    // 内存友好的去重
    public static <T> Set<T> memoryEfficientDeduplication(List<T> data) {
        // 分批处理，避免一次性加载所有数据到内存
        int batchSize = 5000;
        Set<T> result = new HashSet<>();

        for (int i = 0; i < data.size(); i += batchSize) {
            int end = Math.min(i + batchSize, data.size());
            List<T> batch = data.subList(i, end);

            // 对每批进行去重
            Set<T> batchUnique = new HashSet<>(batch);
            result.addAll(batchUnique);

            // 可以在这里进行垃圾回收提示
            if (i % (batchSize * 10) == 0) {
                System.gc();  // 在适当的时候建议GC
            }
        }

        return result;
    }

    // 流式处理（Java 8+）
    public static <T> Set<T> streamBasedDeduplication(Stream<T> dataStream) {
        return dataStream
                .filter(Objects::nonNull)  // 过滤null值
                .collect(Collectors.toSet());  // 自动去重
    }

    // 使用示例
    public static void example() {
        // 模拟大数据
        List<String> largeData = new ArrayList<>();
        Random random = new Random();
        for (int i = 0; i < 100000; i++) {
            largeData.add("item" + random.nextInt(10000));  // 有重复数据
        }

        System.out.println("原始数据大小: " + largeData.size());

        // 1. 批量处理
        Set<String> batchResult = memoryEfficientDeduplication(largeData);
        System.out.println("批量去重后大小: " + batchResult.size());

        // 2. 流式处理
        Set<String> streamResult = streamBasedDeduplication(largeData.stream());
        System.out.println("流式去重后大小: " + streamResult.size());
    }

    private static <T> void processBatchResult(Set<T> batch, int start, int end) {
        // 处理批次结果的逻辑
        System.out.printf("批次处理完成: %d-%d, 唯一项数: %d%n",
                start, end - 1, batch.size());
    }
}

📚 Set 面试通关清单

🎯 基础概念 (必须掌握)

Set vs List 核心区别
元素唯一性保证机制
equals() 和 hashCode() 约定
不同Set实现的适用场景

🚀 进阶知识 (面试加分)

🔧 实战能力 (项目应用)

💎 总结

Set 作为Java集合框架的重要组件，在数据去重、快速查找、集合运算等场景中发挥着重要作用：

核心要点回顾：

🎯 HashSet 为主力：90%场景首选，理解哈希去重是关键
🔗 LinkedHashSet 保持顺序：需要FIFO顺序时的选择
🌳 TreeSet 提供排序：自动排序，支持范围查询
🛡️ 线程安全注意：使用 ConcurrentHashMap.newKeySet() 等并发集合
⚡ 性能优化关键：预估容量、批量操作、及时清理

实战建议：

正确重写equals和hashCode：确保自定义对象能正确去重
选择合适的实现类：根据顺序、排序、性能需求选择
注意null处理：TreeSet不允许null，其他允许一个null
容量预分配：大数据量时预估容量避免扩容
使用批量方法：addAll、retainAll、removeAll等提升性能

掌握了这些知识点，你就可以在面试和实际开发中游刃有余地使用Set集合了！🚀

学习建议：深入理解HashSet的哈希机制，再扩展到TreeSet。重点掌握去重原理、性能特征和线程安全处理。记住：选择正确的数据结构比算法优化更重要！

🎯 快速索引​

🔍 什么是 Set 集合？​

Set 核心特性​

Set vs List 对比​

🏗️ 常见 Set 实现对比​

性能特征可视化​

实现类详细对比表​

Set 应用场景矩阵​

🧠 HashSet 深度解析​

核心参数配置​

去重机制实现​

去重流程可视化​

容量与扩容机制​

🌳 TreeSet 红黑树与排序机制​

红黑树核心特性​

TreeSet 构造函数与排序策略​

NavigableSet 高级操作​

⚡ 性能优化指南​

1. 初始容量优化​

2. 批量操作优化​

3. 内存使用优化​

🎯 Set 实战应用与最佳实践​

1. 数据去重工具类​

2. 权限管理系统​

3. 缓存系统实现​

4. 实时数据流处理​

🎯 Set 高频面试题精讲​

🥇 1. HashSet vs TreeSet vs LinkedHashSet 的区别？​

🥈 2. HashSet 的去重原理是什么？​

🥉 3. 如何在 Set 中存储自定义对象？​

🔥 4. TreeSet 的排序机制详解​

⚡ 5. Set 的线程安全问题​

🎯 6. 大数据量下的 Set 优化​

📚 Set 面试通关清单​

🎯 基础概念 (必须掌握)​

🚀 进阶知识 (面试加分)​

🔧 实战能力 (项目应用)​

💎 总结​