add-filter (#7)

Browse files

- update gitattr w/ new file (c2e8f5cf7b74ba0cb2c4c1715fd23bb428eae522)
- add filter ranges (7bff5a46a11256821b44435a3fe588daab6f1e94)
- update readme with filter info (ac2c562d1b8386789e420f056b959434f9973590)

Co-authored-by: William Chen <[email protected]>

Files changed (3) hide show

.gitattributes +1 -0
README.md +9 -0
keep_ranges_1_0_1.json +3 -0

.gitattributes CHANGED Viewed

@@ -37,3 +37,4 @@ droid_language_annotations.json filter=lfs diff=lfs merge=lfs -text
 cam2base_extrinsic_superset.json filter=lfs diff=lfs merge=lfs -text
 cam2base_extrinsics.json filter=lfs diff=lfs merge=lfs -text
 cam2cam_extrinsics.json filter=lfs diff=lfs merge=lfs -text

 cam2base_extrinsic_superset.json filter=lfs diff=lfs merge=lfs -text
 cam2base_extrinsics.json filter=lfs diff=lfs merge=lfs -text
 cam2cam_extrinsics.json filter=lfs diff=lfs merge=lfs -text
+keep_ranges_1_0_1.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -74,6 +74,15 @@ pixel_positions = intrinsics_matrix @ robot_gripper_position_cam
 pixel_positions = pixel_positions[:2] / pixel_positions[2] # Shape 2 x 1 # Done!
 ```
 ## Accessing Annotation Data
 All annotations are stored in `json` files which you can download from this repository.

 pixel_positions = pixel_positions[:2] / pixel_positions[2] # Shape 2 x 1 # Done!
 ```
+## Filtering Data
+Many episodes in DROID contain significant pauses. This is an issue when training models, as these pauses typically happen at the start of episodes, causing the policy to likewise output idle actions when in the home position. To remediate this, we recommend filtering the data you train your policy on, removing all frames that map to idle actions.
+We provide `keep_ranges_1_0_1.json` which maps episode keys to a list of time step ranges that should *not* be filtered out. The episode keys uniquely identify each episode, and are defined as `f"{recording_folderpath}--{file_path}"`. We opt for this unique identifier because both pieces of information are found in the episodes' RLDS metadata, and thus is easy to compute (even with TensorFlow symbolic operations).
+To use this data, we recommend creating a `tf.lookup.StaticHashTable` identifying all frames that should not be filtered (with all other frames being filtered by default). Frames can be uniquely identified by simply concatenating their episode key with their time step within the episode.
+This particular filter `json` is meant for `droid/1.0.1`, NOT `droid/1.0.0`. It was computed by finding all continuous sequences in episodes of non-idle actions that are at least of length 16 (1 second of wallclock time) that are not interrupted by 8 or more idle actions.
 ## Accessing Annotation Data
 All annotations are stored in `json` files which you can download from this repository.

keep_ranges_1_0_1.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5046049ab62a2df2f802df89cf0888b720f852ce2557849417d40899c9a38bc8
+size 28573266