cross-posted from: https://lemmy.sdf.org/post/31995242

Archived

Unveiling Trae: ByteDance’s AI IDE and Its Extensive Data Collection System

Trae - the coding assistant of China’s ByteDance - has rapidly emerged as a formidable competitor to established AI coding assistants like Cursor and GitHub Copilot. Its main selling point? It’s completely free - offering Claude 3.7 Sonnet and GPT-4o without any subscription fees. Unit 221B’s technical analysis, using network traffic interception, binary analysis, and runtime monitoring, has identified a sophisticated telemetry framework that continuously transmits data to multiple ByteDance servers. From a cybersecurity perspective, this represents a complex data collection operation with significant security and privacy implications.

[…]

Key Findings:

  • Persistent connections to minimum 5 unique ByteDance domains, creating multiple data transmission vectors
  • Continuous telemetry transmission even during idle periods, indicating an always-on monitoring system
  • Regular update checks and configuration pulls from ByteDance servers, allowing for dynamic control
  • Permanent device identification via machineId parameter, which appears to be derived from hardware identifiers, enabling long-term tracking capabilities
  • Local WebSocket channels observed collecting full file content, with portions potentially transmitted to remote servers
  • Complex local microservice architecture with redundant pathways for code data, suggesting a deliberate system design
  • JWT tokens and authentication data observed in multiple communication channels, presenting potential credential exposure concerns
  • Use of binary MessagePack format observed in data transfers, adding complexity to security analysis
  • Extensive behavioral tracking mechanisms capable of building detailed user activity profiles
  • Sophisticated data segregation across multiple endpoints, consistent with enterprise-grade telemetry systems

[…]