Skip to content
Arjun Narayan edited this page Mar 13, 2017 · 14 revisions

Cockroach Labs Tech words for non-tech people

Focusing here on those words you won't find otherwise explained on our blog. Why?

  • you want to understand the chatter between devs
  • you're an external contributor to CockroachDB and want to ensure you're not missing anything
  • if you're just starting at Cockroach Labs
  • if you've been at Cockroach Labs for a while but somehow you missed an explanation and you've never dared asking since then

If you wish to have some more words explained, just ask!

The definitions should be given in Simple English. If you find them difficult to understand, ask to clarify!

  • aggregation: an operation that a client app can perform in SQL to simplify a lot of data into a simple result (e.g. counting)
  • Aphyr: the usual name Kyle Kingsbury goes by
  • AWS (Amazon Web Services): Amazon's Cloud hosting
  • Azure: Microsoft's Cloud hosting
  • Bikeshed: many engineers spending a lot of time debating a minor issue - see the story here
  • Blue: test cluster on Azure
  • Cassandra: another DB product we hear about often
  • Chaos: testing method that stops nodes in a test cluster unpredictably
  • Chaos monkey: program that performs chaos testing
  • CI (Continuous integration): program that runs tests and produces reports automatically in the background
  • Cloud: someone else's computer
  • Code yellow: moving an issue to top company priority (idea comes from Google). During a code yellow, any task pertaining to the code yellow takes precedence over non code yellow related tasks.
  • Cutting the release: selecting one particular version of the product to publish out
  • Data sovereignty: the demand for some apps/companies to have data located in specific places geographically, for example in EU data for citizens must be hosted in the EU
  • Delta: test cluster on GCE
  • Denormalization: An explicit copy of some normalized data in a different format, in order to enable faster access. "Denormalized data" = indexes, materialized views, etc. --- all the stuff that copies "base" data into a different format for speed on operations that aren't by primary key.
  • Encryption at rest: have the data encrypted in the database, not only when queried by clients
  • Gamma: test cluster on GCE
  • GCE (Google Compute Engine): Google's Cloud hosting
  • Geospatial index: An index that is efficient for storing 2d coordinates (such as lat/long) such that two points on the coordinate system that are close on the (lat/long) map are stored relatively close together in the index ordering. The uniqueness of the Geospatial index is in maintaining the "closeness" when going down from 2 dimensions (lat/long) to one dimension (the index). Usually achieved with a space filling curve
  • GIS (Geographical Information System): stuff needed to put geographical coordinates in a database
  • Git: a tool and database to store and share source code
  • Index: A copy of some parts of a database table, ordered to make lookups very quick according to the index columns. There is always a "primary index", ordered by primary key, making lookups of a row if you know the primary key very fast. Other indexes are called "secondary indexes", and are ordered by some other criteria (could be some other columns, or even combinations of columns, or even combinations of columns from different tables). An index is a denormalization.
  • Jepsen: a tool that tests databases in a harsh way, made by Aphyr; also the name of Aphyr's blog about database testing
  • Merge: the action of accepting a PR to the main product
  • Mongo: short for MongoDB, another DB product we hear about often
  • Normalization: Normalization refers to the process of reducing copies of data as much as possible so that there aren't too many logical copies of the same information (as that would increase the possibility of errors if some copies are updated without updating all copies). See wikipedia. Usually contrasted with explicit denormalization.
  • OLAP (Online Analytics Processing): a class of applications where the most common queries are long and touch most of the data at a time with complex computations -- contrast with OLTP
  • OLTP (Online Transaction Processing): a class of applications where the most common queries are short and touch a bit of data at a time with simple computations -- contrast with OLAP
  • ORM (Object-Relational Mapping): a piece of software used by an app to access a DB
  • PR (Pull Request): a proposal for a change to the source code submitted for review to colleagues. See "merge"
  • Production monkey: person deploying new versions and maintaining test clusters. Usually rotates and a rotation lasts a week, to encourage all engineers to be familiar with running a full cluster.
  • Range: a portion of the data in a DB. In other distributed databases called a "shard", "chunk", or "tablet".
  • Reg cluster: short for "registration cluster", a production cluster on Google Cloud storing our usage stats
  • Replication factor: how many copies there are of each Range in a DB or Zone. Default is 3.
  • Replica: one of the copies of some Range in a DB or Zone. There are replication factor replicas of a range across a cluster.
  • Rho: A test cluster on GCE
  • Spanner (Google product): another project we get inspiration from
  • Team City: one of our continuous integration tools
  • Time series: a way to organize data in a DB where the data is organized primarily by time; commonly used to store events over time; sometimes subject to OLAP applications
  • Trigger: a way for a user to ask the database to ping the user (or an app) back when some data changes
  • Zone config/zones: CockroachDB way to set different configuration parameters to different parts of a cluster, can be used to set constraints on replication (e.g. at least one copy must be on a different continent) or for data sovereignty (e.g. no copies of this data should reside outside EU territory)
Clone this wiki locally