SchemaSpy¶
Document your database simply and easily
Do you hate starting on a new project and having to try to figure out someone else’s idea of a database? Or are you in QA and the developers expect you to understand all the relationships in their schema? If so then this tool’s for you.
Overview¶
SchemaSpy is a Java-based tool (requires Java 8 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.
SchemaSpy comes with ABSOLUTELY NO WARRANTY.
SchemaSpy is free software licensed and distributed under LGPL version 3 or later
SchemaSpy can be redistributed under the conditions of LGPL version 3 or later.
http://www.gnu.org/licenses/
If you like SchemaSpy, don’t forget to give us a star on Github.
SchemaSpy produces dot-file and uses either the dot executable from Graphviz or embedded viz.js to generate graphical representations of the table/view relationships. This was initially added for people who see things visually.
Now the graphical representation of relationships is a fundamental feature of the tool. Graphvis is not required to view the output generated by SchemaSpy, but the dot program should be in your PATH
(not CLASSPATH) or use the -gv
or -vizjs
arguments, else none of the entity relationship diagrams will be rendered.
SchemaSpy uses JDBC’s database metadata extraction services to gather the majority of its information, but has to make vendor-specific SQL queries to gather some information such as the SQL associated with a view and the details of check constraints. The differences between vendors have been isolated to configuration files and are extremely limited. Almost all of the vendor-specific SQL is optional.
SchemaSpy was mentioned in one of th O’Reilly’s book Java Power Tools
What’s new¶
6.0.0¶
- Html report
- Now uses mustache
- DataTables for data
- Markdown rendering of comments
- DatabaseTypes:
- sqlite-xerial
- redshift
- orathin-service
- netezza
- mysql-socket
- mssql08
- mssql08-jtds
- mssql08-jtds-instance
- impala
- hive
- hive-kerberos-driverwrapper
- hive-kerberos-driverwrapper-zookeeper
6.1.0¶
- Diagrams
- Now has option to use embedded viz.js (no need for Graphviz)
-vizjs
- Limit the degree of separation in table diagrams. Allowed values are 1 or 2.
- Now has option to use embedded viz.js (no need for Graphviz)
- XML
- Now includes routines
- Html report
- Column page loads faster
- Table page contains check constraints
Features¶
- Supports most JDBC compliant dbms (support missing? you can add your own)
- Generates ER diagram for foreign keys
- Generates ER diagram for implied relationships (name, type) of a column matches a primary key
- Generates ER diagram for relationships based on rails naming conventions
- Shows column relationship and actions
- Shows routines (Functions/Stored procedures)
- Shows views and definitions
- Will render markdown present in comments
- Allows for supplying additional metadata, see SchemaMeta
- Present a set of found anomalies
Sample documentation¶
Browse some sample documentation generated by SchemaSpy. Note that this was run against an extremely limited schema so it doesn’t show the full power of the tool.
SchemaSpy GUI¶
SchemaSpy is a command line tool. If you’re more comfortable with the point-and-click approach then try out Joachim Uhl’s SchemaSpyGUI.
Installation¶
Requirements¶
Before you can start using SchemaSpy you must have installed two things in your system environment.
Java 8¶
Download instructions for all operating systems: https://java.com/en/download/manual.jsp
Optional¶
Since version 6.1.0 Graphviz is no longer required. There is an embedded viz.js that can be used by adding command line argument vizjs
Graphviz¶
- Windows
The easiest way to install Graphviz is to download the msi package from http://www.graphviz.org/download/
Warning
Remember to add the folder containing Graphviz’s dot.exe application to your system PATH variable, eg.
C:\Program Files (x86)\Graphviz2.38\bin
- Linux, Mac OS
Please read carefully the detailed instructions on how to install Graphviz on your os version http://www.graphviz.org/download/.
SchemaSpy¶
SchemaSpy is just a single executable jar-file (schemaspy-[version].jar), you can download releases from http://schemaspy.org or the github releases page https://github.com/schemaspy/schemaspy/releases
If you feel adventurous there is a link in the README.md for latest builds.
When you have your jar-file head on over to Get Started
Get Started¶
Configuration¶
Parameters can be specified in the command line (described below) or you can predefine configuration in the file.
SchemaSpy will search configuration file in
<current-dir>/schemaspy.properties
To use an alternative configuration file run SchemaSpy with parameter: java -jar schemaspy.jar -configFile path/to/config.file
Config file example:
# type of database. Run with -dbhelp for details
schemaspy.t=mssql
# optional path to alternative jdbc drivers.
schemaspy.dp=path/to/drivers
# database properties: host, port number, name user, password
schemaspy.host=server
schemaspy.port=1433
schemaspy.db=db_name
schemaspy.u=database_user
schemaspy.p=database_password
# output dir to save generated files
schemaspy.o=path/to/output
# db scheme for which generate diagrams
schemaspy.s=dbo
Running SchemaSpy¶
You can easily run SchemaSpy from the command line:
java -jar schemaspy.jar -t dbType -dp C:/sqljdbc4-3.0.jar -db dbName -host server -port 1433 [-s schema] -u user [-p password] -o outputDir
Parameters priority¶
It is important to notice, that command-line parameters override those configured in schemaspy.properties file.
Commonly used parameters¶
- [-t databaseType]
- Type of database (e.g. ora, db2, etc.). Use -dbhelp for a list of built-in types. Defaults to ora.
- [-db dbName]
- Name of database to connect to
- [-u user]
- Valid database user id with read access. A user id is required unless -sso is specified.
- [-s schema]
- Database schema. This is optional if it’s the same as user or isn’t supported by your database. Use -noschema if your database thinks it supports schemas but doesn’t (e.g. older versions of Informix).
- [-p password]
- Password associated with that user. Defaults to no password.
- [-o outputDirectory]
- Directory to write the generated HTML/graphs to
- [-dp pathToDrivers]
- Looks for drivers here before looking in driverPath in [databaseType].properties. The drivers are usually contained in .jar or .zip files and are typically provided by your database vendor. Supports a directory as argument, which will add directory and all content to classpath, will recurse. Supports multiple paths separated by OS dependent path separator
- [-hq] or [-lq]
- Generate higher or lower-quality diagrams. Various installations of Graphviz (depending on OS and/or version) will default to generat /ing either higher or lower quality images. That is, some might not have the “lower quality” libraries and others might not have the “higher quality” libraries. Higher quality output takes longer to generate and results in significantly larger image files (which take longer to download / display), but the resultant Entity Relationship diagrams generally look better.
- [-imageformat outputImageFormat]
- The format of the image that gets generated. Supported formats are svg and png. Defaults to png.
E.g.
-imageformat svg
For a comprehensive listing see Command-Line Arguments
Advanced Usage¶
Supply Connection-properties¶
- As an example running mysql with a new driver you’ll get warning
- According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn’t set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to ‘false’. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
This can be omited by addind connection property useSSL=false
To add this connection property add following to commandline:
-connprops useSSL\=false
-connprops
can also take a properties file as argument but when escaping the =
with \
it will use it as “useSSL=false”.
If key or value contains /
it needs to be escaped with a single \
. Multiple pairs can be separated by ;
.
If using linux shell or git bash in windows the \
also needs to be escaped with a \
so for example it would look like useSSL\\=false
.
Create your own DB type¶
As an example we will add the connection property from above to the mysql db-type
Create a new file in same directory as the schemaspy-jar, let’s call it mysql-nossl.properties
Add the following content to mysql-nossl.properies:
extends=mysql connectionSpec=jdbc:mysql://<hostOptionalPort>/<db>?useSSL=false
Now you can run schamaspy with -t mysql-nossl
If you want to have a closer look at the db-types you can find them at github
Supply or override database type¶
Create a new file in same directory as the schemaspy-jar, let’s call it myDbType.properties
Start by extending the database type you want to supply or override sql statements for as an example we will extends postgresSQL:
extends=pgsql
Queries you can supply or override can be found at Sql query instead of DatabaseMetaData we will override routines:
extends=pgsql selectRoutinesSql=select r.routine_name, case p.prokind when 'f' then 'FUNCTION' when 'p' then 'PROCEDURE' when 'a' then 'AGGREGATE' when 'w' then 'WINDOW' else 'UNKNOWN' end as routine_type, case when p.proretset then 'SETOF ' else '' end || case when r.data_type = 'USER-DEFINED' then r.type_udt_name else r.data_type end as dtd_identifier, r.external_language as routine_body, r.routine_definition, r.sql_data_access, r.security_type, r.is_deterministic, d.description as routine_comment from information_schema.routines r left join pg_namespace ns on r.routine_schema = ns.nspname left join pg_proc p on ns.oid = p.pronamespace and r.routine_name = p.proname left join pg_description d on d.objoid = p.oid where r.routine_schema = :schema
We also want to add materialized views to view as document at Other Properties:
extends=pgsql selectRoutinesSql=select r.routine_name, case p.prokind when 'f' then 'FUNCTION' when 'p' then 'PROCEDURE' when 'a' then 'AGGREGATE' when 'w' then 'WINDOW' else 'UNKNOWN' end as routine_type, case when p.proretset then 'SETOF ' else '' end || case when r.data_type = 'USER-DEFINED' then r.type_udt_name else r.data_type end as dtd_identifier, r.external_language as routine_body, r.routine_definition, r.sql_data_access, r.security_type, r.is_deterministic, d.description as routine_comment from information_schema.routines r left join pg_namespace ns on r.routine_schema = ns.nspname left join pg_proc p on ns.oid = p.pronamespace and r.routine_name = p.proname left join pg_description d on d.objoid = p.oid where r.routine_schema = :schema viewTypes=VIEW,MATERIALIZED VIEW
Now run schemaspy with your own database type
-t myDbType
Create you own DB type super advanced¶
Now we are going to connect to mysql thru unix socket, put on your helmets
Download a unix socket library for java and all of it’s dependencies, for simplicity put them in a sub-folder called
drivers
in the same folder as the schemaspy-jar:junixsocket-common-2.0.4.jar junixsocket-mysql-2.0.4.jar junixsocket-native-2.0.4-x86_64-MacOSX-gpp-jni.nar <- Im on OSX junixsocket-native-2.0.4.nar mysql-connector-java-5.1.32.jar native-lib-loader-2.1.5.jar slf4j-api-1.7.25.jar slf4j-simple-1.7.25.jar
Create your own db-type let’s call it my-mysql-socket.properties in same folder as the schemaspy-jar with following content:
connectionSpec=jdbc:mysql://<host>/<db>?socketFactory=<socketFactory>&socket=<socket> socketFactory=ClassName of socket factory which must be in your classpath socket=Path To Socket
Now run schemaspy with the following options:
java -jar [schemaspy.jar] -t my-mysq-socket \ -dp lib/mysql-connector-java-[version].jar \ -loadjars \ -db [DBName] \ -host localhost \ -port 3306 \ -u [User] \ -socketFactory org.newsclub.net.mysql.AFUNIXDatabaseSocketFactory \ -socket [pathToSocket] \ -o [outputDir]
Replace values accordingly.
Yes, you need to specify-db
,-host
,-port
Yes, the socketFactory could have been written directly into the properties-file, this is just an example, mysql-socket exists as a db-type exactly like this.
And since you might want to use another unix socket library this doesn’t close any doors.
Add markdown comments using additional metadata¶
Schemaspy supports markdown in comments markdown
Not all dbms supports comments or long enough comments or comments might just be missing.
Schemaspy also supports supplying additional metadata SchemaMeta
More precise the ability to add/replace comments. Add comments/remarks
1 2 3 4 5 6 7 8 | <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.org/xsd/6/schemameta.xsd" >
<comments>Database comment</comments>
<tables>
<table name="ACCOUNT" comments="I've added comment that links using markdown to markdown documentation [markdown](https://daringfireball.net/projects/markdown/)" >
<column name="accountId" comments="And now the schemaspy avatar ![avatar](https://avatars3.githubusercontent.com/u/20635098?s=20&v=4 "SchemaSpy")" />
</table>
</tables>
</schemaMeta>
|
Now just run with -meta [path-to-above-xml]
Single Sign-On¶
mssql¶
When running SchemaSpy
before -jar SchemaSpy-[version].jar
add -Djava.library.path=[path-to-dir-containing-sqljdbc_auth.dll]
after -jar SchemaSpy-[version].jar
add -sso
When using cmd add -connprops integratedSecurity\=true
When using git bash in windows add -connprops integratedSecurity\\=true
mssql-jtds¶
When running SchemaSpy
before -jar SchemaSpy-[version].jar
add -Djava.library.path=[path-to-dir-containing-ntlmauth.dll]
after -jar SchemaSpy-[version].jar
add -sso
Command-Line Arguments¶
Most of the command-line arguments can be specified in a properties file either with the default name schemaspy.properties
or in a file specified using -configFile
the command-line arguments should be prefixed with schemaspy. As an example -sso
would be schemaspy.sso
and
-u username
would be schemaspy.u=username
.
General¶
- [-h]
- Print help message
- [-dbhelp]
- Print databaseType required arguments
- [-configFile filePath]
- Path to configFile to be used, default is to look for schemaspy.properties
- [-o outputDirectory]
- Directory to write the generated HTML/graphs to
DatabaseType¶
You can create you’re own databaseType so lets go through how it works.
Selection¶
On the commandline you specify the databaseType using the option -t
.
The option can be specified with either [name].properties or just [name]
the .properties will be added if missing. So if you create one, be sure
to have .properties extension.
- Example:
-t mysql
- or
-t mysql.properties
- The search order is:
- user.dir/
- Classpath
- Classpath in schemaspy supplied location
This actually means that if you supply -t my_conf/mydbtype
- It will look for:
- file: $user.dir/my_conf/mydbtype.properties
- Classpath: my_conf/mydbtype.properties
- Classpath: org/schemaspy/types/my_conf/mydbtype.properties
Layout¶
It can contain wast amount of properties so we will break it down. The Properties-file can contain instructions.
- extends
extends
which does what it means, it allows one to override or add properties to an existing databaseType (by specifying a parent/base)As an example:
extends=mysql
which you can see in mysql-socket.properties
- include
include.[n]
is a bit different it allows one to add a single property from another databaseType.[n]
is substituted for a number. The value has the form of[databaseType]::[key]
.As an example:
include.1=mysql::schemaSpec
This would have been valid in the mariadb.properties
Then we have required properties:
- description=
- Description for the databaseType (mostly used in logging)
- connectionSpec=
- We will talk more about this one. It’s the connectionUrl used, but it supports token replacement
- driver=
- FQDN of the JDBC driver as an example
org.h2.Driver
ConnectionSpec¶
Let’s dive a bit deeper into the connectionSpec.
As an example from mysql-socket:
extends=mysql
connectionSpec=jdbc:mysql://<host>/<db>?socketFactory=<socketFactory>&socket=<socket>
socketFactory=ClassName of socket factory which must be in your classpath
socket=Path To Socket
We mentioned extends earlier.
ConnectionSpec contains the connectionUrl used with the jdbc driver, some might refer to it as the connectionString.
connectionSpec allow token replacement, a token is <[tokenName]>
.
In the above example we have host, db, socketFactory, socket.
This means that when used it expects the following commandline arguments:
-h [host] (for host)
-db [dbname] (for db)
-socketFactory [socketFactory class]
-socket [path to socket]
host and db are already known, but -socketFactory
and -socket
has become a new commandline argument.
The presence of the keys in the databaseType properties file is only for description, it’s printed when -dbhelp
is used as a commandline argument.
(db and host located in databaseType mysql which is extended)
There is also a synthetic token that can be replaced <hostOptionalPort> which combines host and port if port is supplied.
Default separator is :
but can be changed by specifying another under the key hostPortSeparator
Other Properties¶
- driverPath=
- path to classpath resources that will be used when trying to create the jdbc Driver in java
same as commandline argument
-dp
- dbThreads=
- number of threads that can be used to analyze the database
- schemaSpec=
- regular expression used in conjunction with
-all
(and can be command line param-schemaSpec
) - tableTypes=
- Which types should be considered tables, default is TABLE
- viewTypes=
- Which types should be considered views, default is VIEW
Sql query instead of DatabaseMetaData¶
When metadata in JDBC isn’t cutting the mustard. You can replace it with a sql query. They are prepared and supports named parameters as long as they are available. Data is retrieved by column label. So additional columns are ok, but you might need to alias columns so that they are returned correctly to schemaspy.
- :dbname
- DatabaseName
-db
- :schema
- Schema
-s
- :owner
- alias for :schema
- :table
- table that the query relates to (think selectRowCountSql)
- :view
- alias for :table
- :catalog
- Catalog
-cat
- Possible overrides:
- selectSchemasSql=
- Fetch comments for a schema, expected columns:
schema_comment - selectCatalogsSql=
- Fetch comments for a catalog, expected columns:
catalog_comment - selectTablesSql=
- Fetch tables, expected columns:
table_name, table_catalog, table_schema, table_comment, table_rows - selectViewsSql=
- Fetch views, expected columns:
view_name, view_catalog, view_schema, view_comment, view_definition - selectIndexesSql=
- Fetch indexes, expected columns:
INDEX_NAME, TYPE, NON_UNIQUE, COLUMN_NAME, ASC_OR_DESC - selectPrimaryKeysSql=
- Fetch table PKs, expected columns:
TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, KEY_SEQ, PK_NAME - selectRowCountSql=
- Fetch row count for a table, expected columns:
row_count - selectColumnTypesSql=
- Fetch column type for all columns, expected columns:
table_name, column_name, column_type, short_column_type - selectRoutinesSql=
- Fetch routines, expected columns:
routine_name, routine_type, dtd_identifier, routine_body, routine_definition,sql_data_access, security_type, is_deterministic, routine_comment - selectRoutineParametersSql=
- Fetch parameters for routines, expected columns:
specific_name, parameter_name, dtd_identifier, parameter_mode - selectViewSql=
- Fetch definition for a view, expected columns:
view_definition, text (text has been deprecated) - selectCheckConstraintsSql=
- Fetch check constraints for all tables, expected columns:
table_name, constraint_name - selectTableIdsSql=
- Fetch ids for all tables, expected columns:
table_name, table_id - selectIndexIdsSql=
- Fetch ids for all indexes, expected columns:
table_name, index_name, index_id - selectTableCommentsSql=
- Fetch comments for all tables, expected columns:
table_name, comments - selectColumnCommentsSql=
- Fetch comments for all columns, expected columns:
table_name, column_name, comments
SchemaMeta¶
Is a way to modify input that will affect output from SchemaSpy.
All these instructions are defined in xml the schema can be found here
Schema contains documentation but lets go through the above mentioned features.
Add comments/remarks¶
The xsd currently allows both comments and remarks. However remarks has been deprecated.
So adding a comment will either add, if missing from database, or replace if comments/remarks exist. Supports markdown, example see Add markdown comments using additional metadata
1 2 3 4 5 6 7 8 | <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.org/xsd/6/schemameta.xsd" >
<comments>Database comment</comments>
<tables>
<table name="ACCOUNT" comments="Table comment">
<column name="accountId" comments="Column comment"/>
</table>
</tables>
</schemaMeta>
|
Add relationships¶
1 2 3 4 5 6 7 8 9 10 11 12 | <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.org/xsd/6/schemameta.xsd" >
<tables>
<table name="AGENT">
<column name="acId" type="INT">
<foreignKey table="ACCOUNT" column="accountId" />
</column>
<column name="coId" type="INT">
<foreignKey table="COMPANY" column="companyId" />
</column>
</table>
</tables>
</schemaMeta>
|
Add remote tables¶
Specifying the remoteCatalog and remoteSchema attributes on a table makes it a remote table and as such a logical table.
1 2 3 4 5 6 7 8 9 10 11 12 13 | <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.org/xsd/6/schemameta.xsd" >
<tables>
<table name="CONTRACT" remoteCatalog="other" remoteSchema="other">
<column name="contractId" autoUpdated="true" primaryKey="true" type="INT"/>
<column name="accountId" type="INT">
<foreignKey table="ACCOUNT" column="accountId"/>
</column>
<column name="agentId" type="INT">
<foreignKey table="AGENT" column="aId"/>
</column>
</table>
</tables>
</schemaMeta>
|
Add columns¶
1 2 3 4 5 6 7 | <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.org/xsd/6/schemameta.xsd" >
<tables>
<table name="ACCOUNT">
<column name="this_is_new" type="INT" />
</table>
</tables>
</schemaMeta>
|
Exclude columns from implied relationships¶
Explicitly disables relationships to or from this column that may be implied by the column’s name, type and size.
Available options: to, from, all, none
Default: none
1 2 3 4 5 6 7 | <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.org/xsd/6/schemameta.xsd" >
<tables>
<table name="AGENT">
<column name="accountId" type="INT" disableImpliedKeys="all"/>
</table>
</tables>
</schemaMeta>
|
Exclude columns from diagrams¶
Sometimes the associations displayed on a relationships diagram cause the diagram to become much more cluttered than it needs to be. Enable this setting to not show the relationships between this column and other columns.
Use exceptDirect to disable associations on all diagrams except for the diagrams of tables directly (within one degree of separation) connected to this column.
Available options: all, exceptDirect, none
Defaults: none
1 2 3 4 5 6 7 | <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.org/xsd/6/schemameta.xsd" >
<tables>
<table name="COUNTRY">
<column name="countryId" type="INT" disableDiagramAssociations="all"/>
</table>
</tables>
</schemaMeta>
|
Frequent Asked Questions¶
General¶
Schema or Catalog name can’t be null¶
This means that Schema or Catalog information could not be extracted from connection.
I this case you need to add options -s [schemaName]
or -cat [catalogName]
In most cases for catalog you can use -cat %
In mysql you can use same as -db
“Cannot enlarge memory arrays” when using viz.js¶
According to viz.js documentation the memory is default 16MB this should be enough.
We have increased this to 64 MB if you receive this error, please report this to us.
I just receive a cryptic error like “ERROR - null”¶
The code has previously avoided to log stracktraces, we now log them but only when
-debug
is used. So any cryptic error can be enhanced with stacktrace by running
SchemaSpy with the argument -debug