Data Upload

Welcome to the PumasCP data upload guide. This essential feature allows easy data import, facilitating robust analysis and report generation. Users can quickly upload various types of data, including single and multiple dose studies, enabling efficient analysis and insightful reporting. This section will detail the upload process, supported formats, and best practices, helping users leverage their data for impactful results. Let’s explore how to optimize your data analysis and reporting with PumasCP.

Prerequisites

Before proceeding with data upload, please ensure that the following prerequisites are met to ensure a smooth and successful process:

Data Preparation:

Prior to uploading, ensure that your data is formatted correctly and adheres to the supported data formats. Cleane and preprocess your data to remove any inconsistencies, errors, or invalid entries that may affect the analysis process.

File Size Limitations:

The dataset size limitation on PumasCP is 500 mb. Be aware of any limitations on file size imposed by your organization's IT policies or the PumasCP platform. Split large datasets into smaller files if necessary to meet size restrictions.

Network Connectivity:

For the online version, ensure a stable internet connection to prevent interruptions during the data upload process.

Supported Data Formats

The PumasCP data upload feature supports various common formats, allowing easy dataset import for analysis and reporting. Users can rename datasets and add notes post-upload for convenience. Supported formats include:

File type	File extension	Notes
Comma-Separated Values	CSV	Widely used for tabular data; rows are lines, columns separated by commas.
Microsoft Excel Open XML Spreadsheet	XLSX	Excel spreadsheet files, containing multiple sheets of tabular data. XLS is not supported.
Tab-Separated Values	TSV	Similar to CSV, but uses tabs as delimiters.
Stata Data File	DTA	Native format for Stata statistical software for storing datasets.
SAS Binary Encoded Data	SAS7BDAT	Binary data format used by SAS software for storing datasets.
SAS Transport File	XPT	Used for transporting datasets between different SAS systems or versions.
SPSS Portable File	POR	Used by SPSS for storing datasets in a portable format.
SPSS Data File	SAV	Format used by SPSS for storing datasets.

Please note that there may be restrictions or limitations on file size or encoding for certain formats.

Note

When uploading XLSX files any empty columns mark the end of the data that is loaded into the application from the file, even if there are subsequent columns with data. Store individual tables in separate sheets rather than within the same sheet.

Data Types and Structures

When uploading data to PumasCP, it's essential to understand the expected data types for each column or field in your dataset. The software supports various data types, including:

Data Type	Notes	Examples
Integer (Int)	Whole numbers without decimal points.	`-1`, `10`, `-5`, `0`
Number (Floating Point)	Numbers with decimal points.	`-4.0055`, `0.922`
String	Sequences of characters, used for text data.	`"hello"`, `"world"`, `"123"`, `"2019-08-08"`
Date	Date values in various formats.	`2019-08-08`, `08/08/2019`, `08-08-2019`

Note

If a column type has a '?' as a suffix, it means that there are some missing/empty values in the column along with values of the parent type, e.g. String?, Number?, and Integer?.

Data Upload Workflow

Uploading a Dataset

Navigate to the study page and click on the Upload Dataset button. This action opens the file upload dialog.
In the dialog, users can either:
- Click on the file upload button to select a file from their disk.
- Drag and drop a file from their local disk into the dialog.
After selecting the file, a preview window appears. It displays the data on the left and upload options on the right panel.
Users can name the uploaded file and customize the upload settings based on the file type using the options in the right panel.

Note

Users can only upload one data file at a time. To upload multiple files, repeat the process for each file.

Cancelling the Upload

To interrupt the upload process, users can click on Cancel, close the dialog with the X, or press Esc on the keyboard.

Completing the Upload

Clicking the Done button will upload the file to PumasCP, tagging it as original and timestamping it.

Note

During upload, the original precision of numbers is maintained, though the preview may show truncated values for better viewing, set to 3 significant digits. Full precision is used in subsequent computations.

CSV Customisation Options

1. Data Starts from Row number

The "data starts from row number" is a feature in PumasCP available while customising CSV files. Datasets often contain header rows or metadata at the beginning that describe the content of the data. These header rows typically include information such as variable names, units, or other annotations. Therefore this feature in PumasCP enables users to specify the starting point of the actual data within the dataset, allowing for accurate and efficient data importation and analysis.

It is particularly useful when dealing with datasets that have a variable number of header rows or when the user wants to skip certain rows before the actual data starts. By providing this information, PumasCP can accurately read and parse the data, ignoring any header rows or metadata that precede it.

By default, this option is set to auto, but manipulated using positive numbers.

1 = Represents dataset where header is repeated as data. This means if there are any existing column names that are identified, they are bought down to first row of the dataset and the column names become auto-generated like Column1, Column2, ..., ColumnN. Therefore the row_length increases by 1.
2 = same as auto. Header is identified and used as column names.
3, 4, ..., n = (row_length - n + 2). After 2, use the number in the formula to find out how many rows you want to retain. The rows are retained from last, including the column headers.

2. Specify values as `missing`

The "Specify values as missing" feature in PumasCP is a functionality that allows users to explicitly identify certain values in their dataset as missing or undefined. This feature is particularly useful when working with datasets that contain information suggesting that a value can be considered as missing, which are common in real-world datasets due to various reasons such as data collection errors, or incomplete records. In PumasCP, users can specify which values in their dataset should be treated as missing by assigning them a special missing designation. By doing so, users can ensure that these missing values are handled appropriately during data analysis and modeling tasks, preventing them from skewing the results or causing errors in calculations.

There are different options on how to specify missing/empty values in different types of columns in the dataset.

Options	Integer? (Integer + missing)	Number? (Float + missing)	String? (String + missing)
`.`	String	String	String
`Empty String`	Integer?	Number?	String?
`NA`	String	String	String
`na`	String	String	String
`NULL`	String	String	String
`null`	String	String	String
`<LOQ`	String	String	String
`<=LOQ`	String	String	String
`>LOQ`	String	String	String
`Broken Sample`	String	String	String
`Missing Sample`	String	String	String

By selecting these options, you can cast the column type to a String or retain the existing one by using Empty String.

3. Add Notes

During the file upload process, users can enter notes to provide additional context or information about the dataset. These notes can include details such as the source of the data, data collection methods, or any other relevant information that may be useful for other users or for future reference.