This page will address the sixth concept of maven : the repositories
It will talk about the factorisation of hard disk and the purpose of repositories
It will talk a little of development cycle and the fact that the artifact must be stored somewhere
This page is currently under construction. I expect to have it done no later than 2006-11-05 - Mykel
Maven repositories are arbitrary and acessible locations designed to store the artifacts that maven builds produce. Over the course of the development process, a project will produce versions of its output for consumption by other projects. Ultimately, applications are assembled by combining various artifacts and packaging them together into usable software. These artifacts to be packaged have to reside somewhere, and that's where the maven repository comes in. Maven repositories are the central location for maven artifacts, and maven itself knows how to traverse the structure of maven repositories in order to acquire artifacts that the current build needs but does not have direct (and local) access to.
Overriding Defined Repositories
Maven repositories come in a variety of forms, and there are a number of different means for classifying them. Some repositories are stored locally, so remotely. There are diffferent sorts of repositories for storing fixed versions of artifacts, known as releases, as opposed to versions that are currently being developed (snapshots).
Local vs. Remote repository
Maven may use two different types of repositories in the course of a build.
Remote repositories are the ones most often discussed during a maven build. At its simplest, a remote repository is a file system hosted by a web server that provides download access to the artifacts it contains.
A single repository is defined in the Maven Super POM as "central". This repository is hosted at Ibiblio, and is noted in the Super POM as http://repo1.maven.org/maven2. Also accessible is http://www.ibiblio.org/maven2, which is or at least was an alias for the URL defined in the Super POM.
TODO Could someone determine what the exact situation is as of now?
Ultimately, all artifacts used in the course of a build are retrieved from the local repository, because any required artifact is downloaded into the local repository where it becomes accessible by the build. Thus, effectively, maven never really uses the artifacts that are stored remotely. Instead, it makes a copy of the remote artifact into the local repository and uses that. This distinction is often lost on manen newbies.
By default, the local repository is located in the HOME/.m2/repository directory of the user invoking the maven build. For unix users, the HOME is usally dented by "~". For Windows users, it's very often in C:\Documents and Settings\insertusername
Release vs. Snapshot repositories
A further segregation of remote repositories is the separation of Release repositories and Snapshot repositories. Note that release vs snapshot repos are always remote, since the local repository has effectively no concept of differentiation between releases and snapshots.
Also note that remote repositories can be set to release both snapshots and releases simultaneously.
Release repositories are designed to hold artifacts that have fixed values and released versions, hence the name. Given the lack of control on artifact version names, a release is effectively any artifact whose version does NOT end in the magic string "-SNAPSHOT".
A release repository is denoted in by having the releases tag in the repo definition set to true, indicating that a given remote repo is capable of serving releases.
Snapshots are defined as artifacts that whose version ends in "-SNAPSHOT"
Snapshot repositories are denoted by having their snapshots tag set to true in the repository definition.
TODO More on snapshots in snapshot repositories
Internal vs. External repositories
In the course of various forms of development, repositories can be further segregated into those which hold artifacts you produce (internal) and those which hold artifacts that others produce (external).
Internal repositories hold artifacts that your project or projects produce. In a solo or very small team environment, this distinction usually applies to repositories one finds on the Internet (central, the codehaus repos, the jboss repos, etc). However, in an enterprise environment, or really any sort of distributed production, we might need to distinguish what we mean by "artifacts you produce".
For instance, if your software staff is working on a product (P) and that product consists of two components (A and B), then you might just have a single repository to handle all your deployment needs. However, if A actually consists of 27 sub-components, each of which may have sub-components, and B consists of another 15 subs, then you may want to think about segregating your repositories into multiple different repositories. See Enterprise Maven Repositories below for fiurther clarification.
This is any repository that does not meet the criteria for an internal repository. To paraphrase, one project's internal repo is another project's external. So one groups internal repositories might be considered the external repos for another group within the same enterprise.
Repositories in Enterprise Development Environments
Why is it so hard to do Enterprise maven development? The whole purpose of maven is to provide structure and support to produce effective, repeatable builds
One reason is that developers usually use IDEs to do development, and maven plays marginally well with those environments.
Another reason is that in the course of development, developers will often be deploying artifacts on top of each other. This is a rather compelling argument for segregating repositories into groups.
Barring any security issues, there can usually be a single release repository where all the artifacts of the enterprise. If two separate projects are going to depend on org.mydom:DefaultArtifact:1.0 then you probably don't want both of them to reference the same instance of that artifact. One very easy way ensure that is to only keep one copy of the dependent artifact.
However, for snapshot repositories, often there's a different picture. Suppose a software production team consists of 100 members, divided into 4 teams of 25 and numbered 1-4. Group 1 is responsible for the business services, which consists of 40 inter-related components. Group 2 does data access, with another 20 components. Group 3 handles external interfaces, with 20 components. Group 4 does a simple gui with 20 components. This brings the total to 100 components.
Groups 1 and 3
Groups 1, 2, and 3
So how do we manage this? Group 1 has 25 members working on 40 components. In the course of the normal maven lifecycle, the head of the TRUNK for each project would produce SNAPSHOT versions of the artifacts. Not always, but Groups 2 and 3 might depend on those SNAPSHOTs and they might now. But even if they do, then if group 3 depends on group 1 SNAPSHOTS, then if the snapshot is bad then group 3 is forced to fall back to a previous version. This sounds simple but in practice can be incredibly arduous and requires meticulous management of the POMs and constant vigilance. But we don't want to force group 1 not to deploy SNAPSHOTS for testing within group 1.
One answer is to have seperate SNAPSHOT repositories for each group, where the artifacts for a given group have definitions for that groups snapshot repo, probably within a group parent POM. Another repository defined in all repositories, maybe in an enterprise parent POM, is the general release repository. Thus, group 1's internal development could internally depend on group 1's snapshots and then when artifacts are capable of being released, they could then be released to the general release repo and thus accessible by the other groups.
TODO Explain this better