Database administrator

From Wikipedia, the free encyclopedia

A database administrator (DBA) is a person who is responsible for the environmental aspects of a database. In general, these include:

  • Recoverability - Creating and testing Backups
  • Integrity - Verifying or helping to verify data integrity
  • Security - Defining and/or implementing access controls to the data
  • Availability - Ensuring maximum uptime
  • Performance - Ensuring maximum performance
  • Development and testing support - Helping programmers and engineers to efficiently utilize the database.

The role of a database administrator has changed according to the technology of database management systems (DBMSs) as well as the needs of the owners of the databases. For example, although logical and physical database design are traditionally the duties of a database analyst or database designer, a DBA may be tasked to perform those duties.

Contents

[edit] Duties

The duties of a database administrator vary and depend on the job description, corporate and Information Technology (IT) policies and the technical features and capabilities of the DBMS being administered. They nearly always include disaster recovery (backups and testing of backups), performance analysis and tuning, data dictionary maintenance, and some database design.

Some of the roles of the DBA may include

  • Installation of new software — It is primarily the job of the DBA to install new versions of DBMS software, application software, and other software related to DBMS administration. It is important that the DBA or other IS staff members test this new software before it is moved into a production environment.
  • Configuration of hardware and software with the system administrator — In many cases the system software can only be accessed by the system administrator. In this case, the DBA must work closely with the system administrator to perform software installations, and to configure hardware and software so that it functions optimally with the DBMS.
  • Security administration — One of the main duties of the DBA is to monitor and administer DBMS security. This involves adding and removing users, administering quotas, auditing, and checking for security problems.
  • Data analysis — The DBA will frequently be called on to analyze the data stored in the database and to make recommendations relating to performance and efficiency of that data storage. This might relate to the more effective use of indexes, enabling "Parallel Query" execution, or other DBMS specific features.
  • Database design (preliminary) — The DBA is often involved at the preliminary database-design stages. Through the involvement of the DBA, many problems that might occur can be eliminated. The DBA knows the DBMS and system, can point out potential problems, and can help the development team with special performance considerations.
  • Data modeling and optimization — By modeling the data, it is possible to optimize the system layout to take the most advantage of the I/O subsystem.
  • Responsible for the administration of existing enterprise databases and the analysis, design, and creation of new databases.

→ Data modeling, database optimization, understanding and implementation of schemas, and the ability to interpret and write complex SQL queries
→ Proactively monitor systems for optimum performance and capacity constraints
→ Establish standards and best practices for SQL
→ Interact with and coach developers in SQL scripting

[edit] Definition of a Database

A database is a collection of related information, accessed and managed by its DBMS. After experimenting with hierarchical and networked DBMSs during the 1970’s, the IT industry became dominated by relational DBMSs (Or Object-Relational Database Management System) such as Informix database, Oracle, Sybase, and, later on, Microsoft SQL Server and the like.

In a strictly technical sense, for any database to be defined as a "Truly Relational Model Database Management System," it should, ideally, adhere to the twelve rules defined by Edgar F. Codd, pioneer in the field of relational databases. To date, while many come close, it is admitted that nothing on the market adheres 100% to those rules, any more than they are 100% ANSI-SQL compliant.

While IBM and Oracle technically were the earliest on the RDBMS scene, many others have followed, and while it is unlikely that miniSQL still exist in their original form, Monty's MySQL is still extant and thriving, along with the Ingres-descended PostgreSQL. Microsoft Access - the 1995+ versions, not the prior versions - were, despite various limitations, technically the closest thing to being 'Truly Relational' DBMS's for the desktop PC, with Visual FoxPro, and many other desktop products marketed at that time far less compliant with Codd's Rules.

A relational DBMS manages information about types of real-world things (entities) in the form of tables that represent the entities. A table is like a spreadsheet; each row represents a particular entity (instance), and each column represents a type of information about the entity (domain). Sometimes entities are made up of smaller related entities, such as orders and order lines; and so one of the challenges of a multi-user DBMS is provide data about related entities from the standpoint of an instant of logical consistency.

Properly managed relational databases minimize the need for application programs to contain information about the physical storage of the data they access. To maximize the isolation of programs from data structures, relational DBMSs restrict data access to the messaging protocol SQL, a nonprocedural language that limits the programmer to specifying desired results. This message-based interface was a building block for the decentralization of computer hardware, because a program and data structure with such a minimal point of contact become feasible to reside on separate computers.

[edit] Recoverability

Recoverability means that, if a data entry error, program bug or hardware failure occurs, the DBA can bring the database backward in time to its state at an instant of logical consistency before the damage was done. Recoverability activities include making database backups and storing them in ways that minimize the risk that they will be damaged or lost, such as placing multiple copies on removable media and storing them outside the affected area of an anticipated disaster. Recoverability is the DBA’s most important concern.

Recoverability, also sometimes called "disaster recovery," takes two primary forms. First the backup, then recovery tests.

The backup of the database consists of data with timestamps combined with database logs to change the data to be consistent to a particular moment in time. It is possible to make a backup of the database containing only data without timestamps or logs, but the DBA must take the database offline to do such a backup.

The recovery tests of the database consist of restoring the data, then applying logs against that data to bring the database backup to consistency at a particular point in time up to the last transaction in the logs. Alternatively, an offline database backup can be restored simply by placing the data in-place on another copy of the database.

If a DBA (or any administrator) attempts to implement a recoverability plan without the recovery tests, there is no guarantee that the backups are at all valid. In practice, in all but the most mature RDBMS packages, backups rarely are valid without extensive testing to be sure that no bugs or human error have corrupted the backups.

[edit] Integrity

Integrity means that the database, or the programs that create its content, embody means of preventing users who provide data from breaking the system’s business rules. For example, a retailer may have a business rule that only individual customers can place orders; and so every order must identify one and only one customer. Oracle Server and other relational DBMSs enforce this type of business rule with constraints, which are configurable implicit queries. To continue the example, in the process of inserting a new order the database may query its customer table to make sure that the customer identified by the order exists.

[edit] Security

Security means that users’ ability to access and change data conforms to the policies of the business and the delegation decisions of its managers. Like other metadata, a relational DBMS manages security information in the form of tables. These tables are the “keys to the kingdom” and so it is important to protect them from intruders. so that is why the security is more and more important for the databases.

[edit] Performance

Performance means that the database does not cause unreasonable online response times, and it does not cause unattended programs to run for an unworkable period of time. In complex client/server and three-tier systems, the database is just one of many elements that determine the performance that online users and unattended programs experience. Performance is a major motivation for the DBA to become a generalist and coordinate with specialists in other parts of the system outside of traditional bureaucratic reporting lines.

Techniques for database performance tuning have changed as DBA's have become more sophisticated in their understanding of what causes performance problems and their ability to diagnose the problem.

In the 1990s, DBAs often focused on the database as a whole, and looked at database-wide statistics for clues that might help them find out why the system was slow. Also, the actions DBAs took in their attempts to solve performance problems were often at the global, database level, such as changing the amount of computer memory available to the database, or changing the amount of memory available to any database program that needed to sort data.

DBA's now understand that performance problems initially must be diagnosed, and this is best done by examining individual SQL statements, table process, and system architecture, not the database as a whole. Various tools, some included with the database and some available from third parties, provide a behind the scenes look at how the database is handling the SQL statements, shedding light on what's taking so long.

Having identified the problem, the individual SQL statement can be tuned, and this is usually done by either rewriting it, using hints, adding or modifying indexes, or sometimes modifying the database tables themselves.

[edit] Development/Testing Support

Development and testing support is typically what the database administrator regards as his or her least important duty, while results-oriented managers consider it the DBA’s most important duty. Support activities include collecting sample production data for testing new and changed programs and loading it into test databases; consulting with programmers about performance tuning; and making table design changes to provide new kinds of storage for new program functions.

Here are some IT roles that are related to the role of database administrator:

[edit] See also