- Developed the web UI for displaying the captured data
- Completed the end to end data pipeline from device to cloud to annotation
The development of humanoids and general-purpose robots is constrained by limited access to large-scale data. To build models that truly generalize, we need to construct a diverse dataset of task demonstrations [1]. Just as LLMs achieved breakthroughs via internet-scale data, complex humanoid systems will require a similar scale of data.
The current industry processes for collecting training data for humanoid platforms involve teleoperation, puppeting and simulation. Teleoperation and puppeting require shipping and deploying expensive hardware to each new environment—driving up costs. Simulation avoids hardware deployment but demands the time‐consuming creation of high‐fidelity virtual assets, limiting the diversity and realism of the data; this is for both human-controlled and reinforcement-learning simulation environments.
As a result, we see that teleoperation/puppeting and simulation are at two different ends of the cost to data diversity spectrum. Teleoperation and puppeting yield rich, diverse datasets but are bottlenecked by deployment complexity and expense, whereas simulations are easy to distribute yet struggle to capture the full breadth of real‐world variation.
At the core, humans rely on a combination of vision, spatial awareness, audio and tactile feedback to perform manipulation tasks. Thus we propose a head-mounted, sensor rig to capture all data modalities while a human completes their work. This will be paired with gloves that will collect tactile data along with hand pose estimations.
As participants perform their tasks, (cleaning, manufacturing, folding clothes, etc.) the headset records what they see while the gloves record tactile and kinematic hand data. We then further augment this data by incentivising the human to dictate their current task. It is important to underscore the uniqueness of collecting audio annotations and tactile sensing data at such a large scale. There are currently no datasets of scale with tactile data and task annotations [1].