Transformers documentation
Feature Extractor
Feature Extractor
A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to generate Log-Mel Spectrogram features, feature extraction from images, e.g., cropping image files, but also padding, normalization, and conversion to NumPy, PyTorch, and TensorFlow tensors.
FeatureExtractionMixin
This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.
from_pretrained
< source >( pretrained_model_name_or_path: Union cache_dir: Union = None force_download: bool = False local_files_only: bool = False token: Union = None revision: str = 'main' **kwargs )
Parameters
-  pretrained_model_name_or_path (stroros.PathLike) — This can be either:- a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
- a path to a directory containing a feature extractor file saved using the
save_pretrained() method, e.g.,
./my_model_directory/.
- a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json.
 
-  cache_dir (stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.
-  force_download (bool, optional, defaults toFalse) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.
-  resume_download (bool, optional, defaults toFalse) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.
-  proxies (Dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.The proxies are used on each request.
-  token (strorbool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, or not specified, will use the token generated when runninghuggingface-cli login(stored in~/.huggingface).
-  revision (str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git.
Instantiate a type of FeatureExtractionMixin from a feature extractor, e.g. a derived class of SequenceFeatureExtractor.
Examples:
# We can't instantiate directly the base class *FeatureExtractionMixin* nor *SequenceFeatureExtractor* so let's show the examples on a
# derived class: *Wav2Vec2FeatureExtractor*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h"
)  # Download feature_extraction_config from huggingface.co and cache.
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "./test/saved_model/"
)  # E.g. feature_extractor (or model) was saved using *save_pretrained('./test/saved_model/')*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("./test/saved_model/preprocessor_config.json")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False
)
assert feature_extractor.return_attention_mask is False
feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False, return_unused_kwargs=True
)
assert feature_extractor.return_attention_mask is False
assert unused_kwargs == {"foo": False}save_pretrained
< source >( save_directory: Union push_to_hub: bool = False **kwargs )
Parameters
-  save_directory (stroros.PathLike) — Directory where the feature extractor JSON file will be saved (will be created if it does not exist).
-  push_to_hub (bool, optional, defaults toFalse) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id(will default to the name ofsave_directoryin your namespace).
-  kwargs (Dict[str, Any], optional) — Additional key word arguments passed along to the push_to_hub() method.
Save a feature_extractor object to the directory save_directory, so that it can be re-loaded using the
from_pretrained() class method.
SequenceFeatureExtractor
class transformers.SequenceFeatureExtractor
< source >( feature_size: int sampling_rate: int padding_value: float **kwargs )
This is a general feature extraction class for speech recognition.
pad
< source >( processed_features: Union padding: Union = True max_length: Optional = None truncation: bool = False pad_to_multiple_of: Optional = None return_attention_mask: Optional = None return_tensors: Union = None )
Parameters
-  processed_features (BatchFeature, list of BatchFeature, Dict[str, List[float]],Dict[str, List[List[float]]orList[Dict[str, List[float]]]) — Processed inputs. Can represent one input (BatchFeature orDict[str, List[float]]) or a batch of input values / vectors (list of BatchFeature, Dict[str, List[List[float]]] or List[Dict[str, List[float]]]) so you can use this method during preprocessing as well as in a PyTorch Dataloader collate function.Instead of List[float]you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors), see the note above for the return type.
-  padding (bool,stror PaddingStrategy, optional, defaults toTrue) — Select a strategy to pad the returned sequences (according to the model’s padding side and padding index) among:- Trueor- 'longest': Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).
- 'max_length': Pad to a maximum length specified with the argument- max_lengthor to the maximum acceptable input length for the model if that argument is not provided.
- Falseor- 'do_not_pad'(default): No padding (i.e., can output a batch with sequences of different lengths).
 
-  max_length (int, optional) — Maximum length of the returned list and optionally padding length (see above).
-  truncation (bool) — Activates truncation to cut input sequences longer thanmax_lengthtomax_length.
-  pad_to_multiple_of (int, optional) — If set will pad the sequence to a multiple of the provided value.This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5(Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
-  return_attention_mask (bool, optional) — Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor’s default.
-  return_tensors (stror TensorType, optional) — If set, will return tensors instead of list of python integers. Acceptable values are:- 'tf': Return TensorFlow- tf.constantobjects.
- 'pt': Return PyTorch- torch.Tensorobjects.
- 'np': Return Numpy- np.ndarrayobjects.
 
Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with self.padding_side,
self.padding_value)
If the processed_features passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the
result will use the same type unless you provide a different tensor type with return_tensors. In the case of
PyTorch tensors, you will lose the specific device of your tensors however.
BatchFeature
class transformers.BatchFeature
< source >( data: Optional = None tensor_type: Union = None )
Parameters
-  data (dict, optional) — Dictionary of lists/arrays/tensors returned by the call/pad methods (‘input_values’, ‘attention_mask’, etc.).
-  tensor_type (Union[None, str, TensorType], optional) — You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.
Holds the output of the pad() and feature extractor specific __call__ methods.
This class is derived from a python dictionary and can be used as a dictionary.
convert_to_tensors
< source >( tensor_type: Union = None )
Parameters
-  tensor_type (stror TensorType, optional) — The type of tensors to use. Ifstr, should be one of the values of the enum TensorType. IfNone, no modification is done.
Convert the inner content to tensors.
to
< source >( *args **kwargs ) → BatchFeature
Send all values to device by calling v.to(*args, **kwargs) (PyTorch only). This should support casting in
different dtypes and sending the BatchFeature to a different device.
ImageFeatureExtractionMixin
Mixin that contain utilities for preparing image features.
center_crop
< source >( image size ) → new_image
Parameters
-  image (PIL.Image.Imageornp.ndarrayortorch.Tensorof shape (n_channels, height, width) or (height, width, n_channels)) — The image to resize.
-  size (intorTuple[int, int]) — The size to which crop the image.
Returns
new_image
A center cropped PIL.Image.Image or np.ndarray or torch.Tensor of shape: (n_channels,
height, width).
Crops image to the given size using a center crop. Note that if the image is too small to be cropped to the
size given, it will be padded (so the returned result has the size asked).
Converts PIL.Image.Image to RGB format.
expand_dims
< source >( image )
Expands 2-dimensional image to 3 dimensions.
flip_channel_order
< source >( image )
Flips the channel order of image from RGB to BGR, or vice versa. Note that this will trigger a conversion of
image to a NumPy array if it’s a PIL Image.
normalize
< source >( image mean std rescale = False )
Parameters
-  image (PIL.Image.Imageornp.ndarrayortorch.Tensor) — The image to normalize.
-  mean (List[float]ornp.ndarrayortorch.Tensor) — The mean (per channel) to use for normalization.
-  std (List[float]ornp.ndarrayortorch.Tensor) — The standard deviation (per channel) to use for normalization.
-  rescale (bool, optional, defaults toFalse) — Whether or not to rescale the image to be between 0 and 1. If a PIL image is provided, scaling will happen automatically.
Normalizes image with mean and std. Note that this will trigger a conversion of image to a NumPy array
if it’s a PIL Image.
Rescale a numpy image by scale amount
resize
< source >( image size resample = None default_to_square = True max_size = None ) → image
Parameters
-  image (PIL.Image.Imageornp.ndarrayortorch.Tensor) — The image to resize.
-  size (intorTuple[int, int]) — The size to use for resizing the image. Ifsizeis a sequence like (h, w), output size will be matched to this.If sizeis an int anddefault_to_squareisTrue, then image will be resized to (size, size). Ifsizeis an int anddefault_to_squareisFalse, then smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).
-  resample (int, optional, defaults toPILImageResampling.BILINEAR) — The filter to user for resampling.
-  default_to_square (bool, optional, defaults toTrue) — How to convertsizewhen it is a single int. If set toTrue, thesizewill be converted to a square (size,size). If set toFalse, will replicatetorchvision.transforms.Resizewith support for resizing only the smallest edge and providing an optionalmax_size.
-  max_size (int, optional, defaults toNone) — The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater thanmax_sizeafter being resized according tosize, then the image is resized again so that the longer edge is equal tomax_size. As a result,sizemight be overruled, i.e the smaller edge may be shorter thansize. Only used ifdefault_to_squareisFalse.
Returns
image
A resized PIL.Image.Image.
Resizes image. Enforces conversion of input to PIL.Image.
rotate
< source >( image angle resample = None expand = 0 center = None translate = None fillcolor = None ) → image
Returns a rotated copy of image. This method returns a copy of image, rotated the given number of degrees
counter clockwise around its centre.
to_numpy_array
< source >( image rescale = None channel_first = True )
Parameters
-  image (PIL.Image.Imageornp.ndarrayortorch.Tensor) — The image to convert to a NumPy array.
-  rescale (bool, optional) — Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default toTrueif the image is a PIL Image or an array/tensor of integers,Falseotherwise.
-  channel_first (bool, optional, defaults toTrue) — Whether or not to permute the dimensions of the image to put the channel dimension first.
Converts image to a numpy array. Optionally rescales it and puts the channel dimension as the first
dimension.
to_pil_image
< source >( image rescale = None )
Parameters
-  image (PIL.Image.Imageornumpy.ndarrayortorch.Tensor) — The image to convert to the PIL Image format.
-  rescale (bool, optional) — Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default toTrueif the image type is a floating type,Falseotherwise.
Converts image to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if
needed.