- What is a ROS2 Dataset?
- Why Datasets are Critical in Robotics
- Setting Up a ROS2 Dataset with Foxglove
- Visualizing Datasets in RViz
- Using Datasets with TurtleBot3 SLAM
- Example Code for Dataset Analysis
- Launch File Example for Automating Dataset Runs
- Common Mistakes When Using ROS2 Datasets
- Building Your Own Dataset
- Future of ROS2 Datasets
- Conclusion
- FAQs
When you start working with ROS2 datasets, you quickly realize that handling real-world robotics data requires both structure and the right set of tools. Whether you’re running a SLAM experiment, visualizing data in RViz, or testing algorithms with a TurtleBot3, datasets serve as the backbone of your robotics development workflow. In this blog, we’ll walk through what ROS2 datasets are, how they’re used, and demonstrate with examples using Foxglove Studio, RViz, and TurtleBot3 SLAM. This guide will help you understand not just the “what” but also the “how” of working with robotics datasets in ROS2.
What is a ROS2 Dataset?
A ROS2 dataset is a collection of recorded robotic data (e.g., lidar scans, odometry, IMU readings, images) that is stored in ROS bag files. These bag files allow developers to replay sensor data as if it were being produced by a real robot. Instead of relying solely on hardware, you can use datasets to test algorithms, validate mapping pipelines, or even train AI perception models.
For example, a ROS2 dataset might contain:
- Laser scans (/scan)
- Odometry (/odom)
- IMU data (/imu)
- Camera images (/camera/color/image_raw)
This makes datasets a powerful tool for debugging, testing, and simulation.
Why Datasets are Critical in Robotics
Robotics involves uncertainty. Sensors are noisy, environments are dynamic, and algorithms often fail in ways you cannot predict in simulation. Datasets give you a repeatable way to test under identical conditions. Here’s why they matter:
- Debugging algorithms – Replaying the same dataset helps find reproducible bugs.
- Benchmarking – Compare algorithm performance across different datasets.
- Training AI models – Labeled datasets can be used to train perception pipelines.
- Simulation to reality transfer – Test algorithms on real-world data without needing constant robot access.
One of the most well-known public robotics datasets is KITTI, which provides raw sensor data from autonomous driving.
Setting Up a ROS2 Dataset with Foxglove
Foxglove Studio is a modern visualization tool designed for robotics data. Unlike RViz, which is bundled with ROS, Foxglove can handle large datasets efficiently, making it ideal for replaying long SLAM or navigation runs.
Steps to Load a ROS2 Dataset in Foxglove
- Install Foxglove Studio (official site).
- Open a recorded ROS bag file:
ros2 bag play dataset_folder - Connect Foxglove Studio to your ROS2 session:
- Open Foxglove Studio
- Add a ROS connection (ROS2 WebSocket Bridge)
- Visualize topics such as /scan, /odom, or /camera.
This workflow helps you inspect dataset contents visually, ensuring your data is consistent before plugging it into an algorithm.
Visualizing Datasets in RViz
RViz remains the go-to visualization tool for most ROS developers. It lets you view live or replayed sensor data in 3D.
Example: Visualizing Lidar and Odometry
Launch RViz:
rviz2
Add a LaserScan display and select /scan.
Add an Odometry display and select /odom.
Replay the dataset:
ros2 bag play dataset_folder
You should now see lidar points and odometry traces updating in RViz as if the robot were moving in real-time.
Using Datasets with TurtleBot3 SLAM
The TurtleBot3 is one of the most popular educational robots in ROS2. When paired with a dataset, it becomes a repeatable testbed for SLAM.
Example Workflow
Install TurtleBot3 packages:
sudo apt install ros-humble-turtlebot3*
Launch SLAM with TurtleBot3:
ros2 launch turtlebot3_slam turtlebot3_slam.launch.py
Replay dataset with lidar scans:
ros2 bag play tb3_dataset
Save generated map:
ros2 run nav2_map_server map_saver_cli -f ~/map
This process allows you to test mapping algorithms without physically driving the robot.
Example Code for Dataset Analysis
Sometimes you don’t just want to visualize but also process datasets programmatically. ROS2’s Python API allows you to read topics from bag files.
Python Example: Reading Odometry and Lidar
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import LaserScan
from nav_msgs.msg import Odometry
class DatasetReader(Node):
def __init__(self):
super().__init__('dataset_reader')
self.create_subscription(LaserScan, '/scan', self.lidar_callback, 10)
self.create_subscription(Odometry, '/odom', self.odom_callback, 10)
def lidar_callback(self, msg):
self.get_logger().info(f"Received {len(msg.ranges)} lidar points")
def odom_callback(self, msg):
x, y = msg.pose.pose.position.x, msg.pose.pose.position.y
self.get_logger().info(f"Odom position: ({x}, {y})")
rclpy.init()
node = DatasetReader()
rclpy.spin(node)
This simple script subscribes to /scan and /odom during dataset playback and logs key values.
Launch File Example for Automating Dataset Runs
Instead of manually launching multiple components, you can automate dataset playback with a launch file.
from launch import LaunchDescription
from launch_ros.actions import Node
def generate_launch_description():
return LaunchDescription([
Node(
package='rosbag2_transport',
executable='play',
arguments=['tb3_dataset'],
output='screen'
),
Node(
package='rviz2',
executable='rviz2',
output='screen'
)
])
This ensures that RViz starts automatically alongside dataset playback.
Common Mistakes When Using ROS2 Datasets
- Mismatched ROS versions – A dataset recorded in ROS2 Humble may not replay smoothly in Galactic.
- Missing TF transforms – Many datasets lack transformations, causing RViz visualizations to break.
- Incorrect topic remapping – If your algorithm expects /scan, but dataset has /lidar_scan, you must remap topics.
Example remap:
ros2 bag play dataset_folder --remap /lidar_scan:=/scan
Building Your Own Dataset
If you’re working with TurtleBot3 or any ROS2-enabled robot, recording your own dataset is straightforward:
ros2 bag record /scan /odom /imu /camera/color/image_raw
This records lidar, odometry, IMU, and camera topics into a dataset for later replay.
Future of ROS2 Datasets
With the rise of AI and cloud robotics, datasets are moving beyond simple bag files. Developers now store datasets in cloud platforms, enabling distributed replay, multi-robot testing, and integration with ML pipelines. Tools like Foxglove Data Platform are making large-scale dataset management seamless.
Conclusion
ROS2 datasets give developers the ability to test, validate, and train robotics systems without needing constant access to hardware. Whether using Foxglove Studio for visualization, RViz for debugging, or TurtleBot3 for SLAM experiments, datasets create a reproducible and efficient development pipeline.
At Robotisim, we provide learning resources to help you bridge the gap from theory to practice in robotics using ROS2 datasets. Explore more on our site to see how you can get hands-on with robotics today.
FAQs
Q1: Which ROS2 version should I start with?
A: ROS2 Humble (Ubuntu 22.04) is currently the most stable release with long-term support, making it the best starting point.
Q2: Can I use ROS2 datasets without owning a robot?
A: Yes. Public datasets like KITTI and EuRoC MAV let you replay lidar, camera, and IMU data directly in ROS2, no hardware required.
Q3: What’s the difference between RViz and Foxglove?
A: RViz is bundled with ROS and best for quick visualization. Foxglove provides a modern interface, handles large datasets, and supports multi-robot debugging.
Q4: Is TurtleBot3 good for beginners?
A: Absolutely. TurtleBot3 is affordable, well-supported in ROS2, and widely used in SLAM, navigation, and dataset recording experiments.
Q5: How do I create my own ROS2 dataset?
A: Use the ros2 bag record command to capture topics like /scan, /odom, and /camera while running your robot. This creates a bag file for future replay.