Sometimes the hardest part of learning a new topic is learning the terminology or jargon that those within the community commonly use. Here is a table of some terms that are common, but may be unfamiliar to someone new to bioinformatics. Some of these are my attempt to define abstract terms. If you want any terms defined or added to the list, or you feel the definitions are inaccurate, please contact me.
Importantly, while some terms technically have different meanings, they are often used synonymously. I have tried to indicate these terms with matching number of asterisks.
Term | Definition |
---|---|
Terminal* | The window in which you type commands in to be interpreted by the shell |
Console* | Similar to terminal, but full screen with no graphical component. |
Shell* | The program that interprets commands typed into a terminal. In your terminal you can type echo $SHELL to check which shell is loaded. |
Bash | A common shell program. |
Command line* | The location where commands are type within the terminal window |
Command prompt* | The information displayed on the command line before the cursor |
Command | An instruction given to the computer to perform an operation (such as opening a file, adding a row to a data table, etc.). The way a command is given to the computer is often dictated by the programming language in which it is to be interpreted (known as that language's syntax). |
Script | A set of commands written in a particular programming language's syntax that are stored as plain text in a file. When interpreted by that programming language, the commands will be executed in order. |
Program | A set of code designed for a specific task. Similar to command, but more general (i.e., a program is not limited to the shell's scripting syntax). |
Library | Files containing general code blocks that can be used widely by different programs |
Dependency | A program or library that is required for another program to run. |
Package | A program and all it's dependencies. |
Module | Similar to library. |
Argument | Options specified in the command line when running a program or command. |
Directory | A named location on a computer that contains files and/or other directories. |
File | A named location on a computer that contains data, commonly in the form of plain text. |
Script | A type of file whose contents are code or commands to be interpreted by the shell or another interpreter (e.g., Python). |
Repository/Repo | From git, a directory of files, possibly including code, documentation, or data. |
File system | The way in which files and directories are organized in a nesting, tree-like structure. |
Root | The lowest level in the file system in which all directories and files are stored. Critical system files are stored close to the root. Usually located at / and usually only accessible by the computer's administrators. |
Home | A user's home directory is where that user has read, write, and execute permissions within the file system. |
Path | The location of a file or directory within the file system, with directories separated by slash characters (/ ). |
Absolute path | The full name of a file or directory that includes all directories and sub-directories starting from the root of the file system to the specified file or directory. |
Relative path | The name of a file or directory that includes all directories and sub-directories starting from the user's current location. |
Operating system | The software that interfaces between the computer's hardware and other user facing software. |
Server | A computer setup to have users connect and work on it remotely, usually with more resources than personal computers to accommodate more resource intensive commands and multiple users. |
Cluster | An interconnected collection of servers setup such that users can connect to one and specify high resource commands to run which are distributed to the others based on available resources. |
Node | One computer within a cluster. |
Login node | The node within the cluster which users connect to and interact with. |
Head node | The node within the cluster which handles job scheduling and resource allocation. |
Job | A submitted command or set of commands passed from the user to the job scheduler on a cluster. |
Text editor | A graphical program on a computer that displays plain text files on the screen and allows them to be edited. |
Integrated Development Environment (IDE) | A graphical program on a computer that contains tools to aid development in a specific programming language (e.g. RStudio for R). |
RStudio | An IDE built around the R programming language. |
Format | Use | Link | Specs |
---|---|---|---|
.txt | General plain text. May be formatted in some way that is unspecified by this extension (i.e., a .txt file could contain comma separated values). | Wikipedia | NA |
.csv | General data storage with rows and columns. Columns are separated by commas. | Wikipedia | Link |
.tab/.tsv | General data storage with rows and columns. Columns are separated by tabs. | Wikipedia | NA |
FASTA | Stores sequence data. | Wikipedia | NA |
FASTQ | Stores sequence data and quality scores. | Wikipedia | Link |
SAM | Sequence Alignment Map format. Stores information about reads mapped to a reference genome. | Wikipedia | Link |
BAM | Binary Alignment Map format. The compressed binary version of SAM format. | Wikipedia | Link |
CRAM | Another compressed format to store read mapping information. | Wikipedia | Link |
VCF | Variant Call Format. Used to store information about variants inferred for a given sample(s). .vcf files are a specific type of tab delmited format. | Wikipedia | Link |
BCF | Binary variant Call Format. The binary compressed verion of a VCF. | Wikipedia | Link |
BED | Stores coordinates of regions of interest. .bed files are a specific type of tab delimited format. | Wikipedia | Link |
GFF | Stores annotation information from a genome. .gff files are a specific type of tab delimited format. | Wikipedia | Link |
GTF | Stores annotation information from a genome. GTF is an earlier version of GFF, but still commonly used, notably by the Ensembl database. .gtf files are a specific type of tab delmited format. | Wikipedia | Link |