np.genfromtxt() Function in Python [5 Examples]

In this NumPy article, I will explain the np.genfromtxt() function in Python, its syntax, the parameters required, and the return values. I will also explain some examples related to the use cases of the np.genfromtxt() function.

The np.genfromtxt() function in Python is a function from NumPy. This function is widely used for reading data from text files, especially when the data is in a tabular format with rows and columns.

NumPy genfromtxt Function Syntax

The basic syntax of np.genfromtxt() function in Python is as follows:

numpy.genfromtxt(fname, dtype=float, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=False, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes')

genfromtxt Function in NumPy Python Parameters

The np.genfromtxt() function in Python has several parameters, each of which allows us to specify how our data should be read and interpreted:

Here,

  1. fname: The file name or a file-like object to read. It could be a string or a pathlib. Path object, or a generator.
  2. dtype: The data type of the resulting array. If None, the dtypes will be determined by the contents of each column, individually.
  3. comments: The character used to indicate the start of a comment.
  4. delimiter: The string used to separate values. By default, any whitespace acts as a delimiter.
  5. skip_header: The number of lines to skip at the beginning of the file.
  6. skip_footer: The number of lines to skip at the end of the file.
  7. converters: A dictionary mapping column number to a function that will convert that column to a float.
  8. missing_values: A set of strings corresponding to missing data.
  9. filling_values: The set of values to use as default when data is missing.
  10. usecols: Which columns to read, with 0 being the first.
  11. names: If True, the first row of the file is read as the names of the columns.
  12. excludelist: A list of names to exclude. This list is appended to the default list [‘return’, ‘file’, ‘print’].
  13. deletechars: A string combining invalid characters that must be deleted from the names.
  14. replace_space: The character to use for replacing white spaces in the names of variables.
  15. autostrip: Whether to automatically strip white spaces from the variables.
  16. case_sensitive: If set to True, field names will be case-sensitive.
  17. defaultfmt: A format used to define default field names.
  18. unpack: If True, the returned array is transposed.
  19. usemask: If True, return a masked array.
  20. loose: If False, raise an error when faced with an invalid line.
  21. invalid_raise: If False, skip over invalid lines.
  22. max_rows: The maximum number of rows to read.
  23. encoding: Encoding of the input file.

numpy.genfromtxt() Function in Python Return Values

The np.genfromtxt() function in Python returns an array, by default a NumPy array. If the usemask is True, it returns a masked array. This array will have the shape and data type as specified by the input parameters and the contents of the file.

Python genfromtxt() function in NumPy use cases

Let’s see some of the use cases of the np.genfromtxt() function in Python are:

1. genfromtxt Python working on Comma delimited file with mixed dtype

To read a file containing columns with different data types (e.g., integers, floats, strings) and allow NumPy to determine the appropriate data type for each column automatically. We can use the np.genfromtxt() function in Python.

import numpy as np

data = np.genfromtxt('C:/Users/kumar/OneDrive/Desktop/Book1.csv', delimiter=',', dtype=None, names=True, encoding='utf-8')
print(data)

Output:

[(1, 'Jade', 40000.12) (2, 'David', 35000.25) (3, 'Alex', 42000.35)
 (4, 'Betty', 38000.18) (5, 'Zoe', 45000.24)]
np.genfromtxt() Function in Python
numpy gen from text in Python

2. np.genfromtxt CSV Using dtype = None

NumPy automatically detects and assigns the data types for each column in the dataset, which is especially useful when the specific data types of the columns are not known in advance.

For instance:

import numpy as np

data = np.genfromtxt('C:/Users/kumar/OneDrive/Desktop/Book1.csv', delimiter=',', dtype=None)
print(data)

Output:

[(b'\xef\xbb\xbf0', 11111, 45000) (b'1', 16041, 40000)
 (b'2', 16043, 35000) (b'3', 16029, 42000) (b'4', 16061, 38000)
 (b'5', 16039, 45000)]

Below is an image displaying the results of the code execution in the PyCharm environment.

genfromtxt csv in Python

3. genfromtxt NumPy function in Python with specified dtype and names

Defining the data types and column names for the dataset being read. It provides control over the interpretation of each column in the data file, ensuring that each column is processed with the intended data type and identified by a specific name.

For example:

import numpy as np

dtype = [('id', int), ('name', 'U10'), ('value', float)]
names = ['ID', 'Name', 'Value']

data = np.genfromtxt('C:/Users/kumar/OneDrive/Desktop/Book1.csv', delimiter=',', dtype=dtype, names=names, skip_header=1)
print(data)

Output:

[(1, 'Jade', 40000.12) (2, 'David', 35000.25) (3, 'Alex', 42000.35)
 (4, 'Betty', 38000.18) (5, 'Zoe', 45000.24)]

After executing the code in Pycharm, one can see the output in the below screenshot.

numpy genfromtxt csv in Python
genfromtxt function in Python NumPy

4. np genfromtxt function in Python with fixed-width columns

The data is read from a file where columns are defined by their width in characters, rather than by a delimiter. It’s particularly useful for parsing data formatted in columns of fixed character widths.

For instance:

import numpy as np

data = np.genfromtxt('C:/Users/kumar/OneDrive/Desktop/Book1.csv', delimiter=[4, 9, 5], dtype=None, names=['ID', 'Name', 'Score'], encoding='utf-8')
print(data)

Output:

[('\ufeffID,', 'Name Code', ',Sala') ('1,Ja', 'de,40000.', '12\n')
 ('2,Da', 'vid,35000', '.25\n') ('3,Al', 'ex,42000.', '35\n')
 ('4,Be', 'tty,38000', '.18\n') ('5,Zo', 'e,45000.2', '4\n')]

The ensuing image captures the results produced by executing the code in PyCharm.

numpy genfromtext in Python
genfromtxt function in Python NumPy

5. np.genfromtxt() Function in Python to show comments

This concept is about ignoring comment lines (typically annotations or descriptions) in a data file. It involves specifying a comment character so that lines starting with this character are not processed as data.

For example:

import numpy as np

data = np.genfromtxt('test.txt', delimiter=[4, 9, 5], dtype=None, comments='#', encoding='utf-8')
print(data)

Output:

[['1,5.' '2,Python\n' 'False']
 ['2,3.' '5,Guides' 'False']]

The output from running the code in PyCharm is visually represented in the screenshot below.

np.genfromtext in Python
np.genfromtxt function in Python

Conclusion

In this article, we delved into the details of the np.genfromtxt() function in Python, exploring its syntax, the required parameters, and the types of values it returns. Additionally, we examined several practical use cases of the np.genfromtxt() function within the NumPy library, demonstrating its versatility in handling different data formats and structures.

You may also like to read: