Skip to end of metadata
Go to start of metadata

Related pages:

Disscussion goes here:

Bring back FeatureCollection!

Abstract:
This is from an email entilted 'Bring back the FeatureCollection!' In it Chris Holmes argues
that a way our of our results/reader mess is to make use of this nice
FeatureCollection construct we came up with, and then promptly relegated
to memory only. Note you should have your coffee before reading this.

...

Thinking in the shower this morning I decided to take a look at our
FeatureCollection/Iterator/Reader/Results mess.

The main thought is that FeatureResults was created to give a very similar
construct to a FeatureCollection (though streaming), and perhaps we could
just take it back in that direction.

Right now FeatureCollection is sort of a code word for an in-memory
collection, we went to featureResults to ease the transition to streaming.
But I'm thinking we should just take back FeatureCollection. Have
DefaultFeatureCollection become MemoryFeatureCollection, and change
DefaultFeatureResults to DefaultFeatureCollection.

Ok, on further thought it would need to be DefaultTypedFeatureCollection.
The main (useful) thing that FeatureResults does that FeatureCollections
do not is return the FeatureType. So we introduce the notion of a
TypedFeatureCollection (something IanS and I thought about but never
implemented). The TypedFeatureCollection would only allow Features that
validated against the FeatureType.

Of course this gets into interesting questions, if the default is to allow
multiple types in the feature collection, or demand a new interface for
that. And then there are also interesting questions of combining two
TypedFeatureCollections. A construct like that would be very useful for
GML production, as you could get at all the FeatureType info directly from
a single FeatureCollection construct.

Ok, I'm rambling into further implications, to bring this back to the
positive aspects of this decision, FeatureReader and FeatureIterator have
similar parallel structures . Reader will return the FeatureType. And it
additionally has the close() method. But note that our FeatureIterator
does not implement java.util.Iterator, so we can add the close method.
So we can also introduce a TypedFeatureIterator, that will return the
FeatureType, and FeatureReader can simply implement that as well.

One thing to note is that FeatureCollection has quite a few more methods
than FeatureResults, since it extends Collection. But one thing to note
is that most operations are optional in the Collection interface. The
only none optional ones are contains, containsAll, equals, hashCode,
isEmpty, iterator, size() (which is actually FeatureResults.getCount(),
and toArray(). All of which could be implemented fairly easily.

DefaultFeatureCollection could change to take a Query and a FeatureSource
as it's constructor, in short it would just expand on
DefaultFeatureResults.

Doing this also gives us a nice interface for random access, in that a
FeatureList would implement list, and provide get(int index), indexOf(),
and lastIndexOf(). So FeatureSources that support random access could
return List implementations, and clients could then check to see if they
got a list. Or there could be an explicit getList(), which would build a
random access index for an unordered collection. Check out
feature/FeatureIndex.java, for some more interesting ideas that IanS had.
Granted we would probably need a TypedList, could possibly be done nicely
by making a Typed interface, which just has a method FeatureType
getFeatureType() (or FeatureType[] getFeatureTypes(), should decide what
the default behaviour is).

Another thought is that all FeatureCollections would be typed, at least
in some way. In this case the in memory FeatureCollection (or any
FeatureCollection that allows any FeatureType), would make use of the
ancestor stuff that we have implemented, all in it would be of type
Feature (I'm not exactly sure how this works, but I'm thinking along the
lines of the SLD stuff, that all must descend from Feature, no?). This
gives a nice little constraint on what a FeatureCollection is vs. a normal
collection . it must contain features. So the getSchema method would
always return at least a Feature FeatureType. I think to get this to work
well we'd have to introduce some type of OrFeatureType . that is a
validation mechanism that can check against more than a single
featureType. Perhaps a Schema class, of which a featureType represents a
single instance of a feature. But a Schema could also have multiple
FeatureTypes. Each Schema class could have a getFeatureTypes() call, and
a FeatureTypeSchema (with just the one), could return itself. I think the
key (and potentially only) method of a Schema would be the validate()
method. FeatureType does not currently have a validate method, but
DefaultAttributeType.Feature does, I think perhaps we should again
contemplate FeatureType implementing AttributeType. Or at least have both
descend from a common Schema class.

(Skip this section if you're having any trouble with all of this and read
it over later)
If we have both from a common schema class then this also generates some
more interesting questions. Like what we do about MultiAttributeTypes.
Because a MultiAttributeType is in some ways a FeatureCollection
(especially if FeatureCollections offer a validate method). For a quick
review, a MultiAttributeType was my construct to basically handle a list.
Never really been used (but I will when we get to joins), but it's for
when there is more than one object in an Attribute, think maxOccurs
greater than one in XML . you can put a bunch of little objects in there.
In the easy to grasp case you'd have more than one sub features, literally
a TypedFeatureCollection . all would be the same type, each in the list
would have to validate against the same FeatureType. But they also could
be any sort of objects, they could be strings (think all the names of
people living at an address). This is a list of objects that must
validate against something in common (all must be strings). So maybe we
could have a ValidatingCollection, of which FeatureCollection is a
subType. Ok, my ideas are straying here, but one thing to think about is
maybe putting in hooks for more complex validations . we already have this
field length thing, which is fairly meaningless. I believe David was
going to do some work on this for his schema stuff, basically it'd be nice
to have in the central api a way to express more complex validations that
we don't currently support).
(end skip section)

Of course, doing this does beg some more interesting questions, like what
if we actually implemented the Collection add and remove. I mean, if the
FeatureSource backing it is a FeatureStore, then an add or remove could
easily update the backend data format. Of course, this obviously gets
messy, when we take into account locking and transactions and also the
CollectionListener code we have (though I don't think it's really been
used, and actually could potentially give us a way out of this problem).

The one question that remains is backwards compatibility, if we can pull
it off. I'm hoping that re-using a few constructs we already have can
help ease this pain. But the main point is that we should take things
back to this great paradigm that IanS brought to the fore, of really
making use of the Collection framework. It makes it far, far easier for
java programmers to get acquainted, they can just see that getFeatures()
returns a Collection, and they instantly know what to do with it, and then
from there can learn the more optimized ways of doing things.

FeatureCollection myFeatures =
dataStore.getFeatureSource('roads').getFeatures();
Feature myFeature = myFeatures.features().next();

Also, a potentially interesting idea for FileDataStores . having it
implement FeatureSource? Or rather maybe call it SingleDataStore? It
just might be cool if you could say

SingleDataStore store = new ShapefileDataStore(myUrl);
Feature myFeature = store.getFeatures().features().next();

(semi(un)related thoughts, just to get people started on what I'm going to
write up next)
Ok, one last thought for everyone to ponder . the file and directory
datastore question brings this up. Basically the thought of a
multi-datastore, what happens when you combine two datastores? It could
still easily implement DataStore, just with more featureTypes, right?
When we get into joins and views it does start to beg the question of what
a DataStore actually is. Especially when you add datarespository and
discovery stuff (is that stuff gone now? I haven't been able to update my
repository, and it's still all over my DataStore). I don't have any
coherent thoughts on all this, but we should really think about what a
DataStore is, and I think we should maybe split it up into more
interfaces, instead of the current direction of adding ever more methods,
so that each interface represents clearly what the DataStore is, and what
you can do with it. Perhaps something like PhysicalDataStore, which
FileDataStore and JDBCDataStore extend, and then a parallel
VirtualDataStore, that is the result of combinations of DataStores (or
perhaps that can also return 'views'). And/or a MultiDataStore, which
DataRepository could then extend, and add the locking functionality.
Discovery could then extend DataRepository or something. Speaking of
which I would still appreciate a write up of the 'discovery' functionality
that Refractions needs, and what their metadata plans/needs are, as I
never got that explanation in the IRC from David. At the very least I'd
like to be sure we are on the same page of what metadata means.
(end semi(un)related thoughts)

Ok, enough for today. Next up for me I will examine some of the questions
I raise above, especially the implications of FeatureViews (my initial
thoughts are actually in the direction of taking out QueryAs and only
giving people As access in FeatureSources), and how to go about decoupling
Features from the data sources they are defined in (mappings and joins and
whatnot). I think it'd be really cool to take James's idea of defining
common FeatureTypes even farther, so you could define your FeatureSource
by the FeatureType you wanted, and supply it with an appropriate mapping
object (and an object on what should be mapped from, the join to perform,
the table to select from, the bypassSql statement, ect.)

  • No labels

1 Comment

  1. On Sun, 2004-11-07 at 05:41, Chris Holmes wrote:> Ok, sorry for the delayed response, I've been on the road.> > ----- Original Message ----- > From: "David Zwiers" <dzwiers@refractions.net>> To: <geotools-devel@lists.sourceforge.net>> Sent: Tuesday, November 02, 2004 9:56 PM> Subject: Geotools-devel org.geotools.data - Refactoring> > > > This is a list of proposed changes to the Data access api. Some of my> > reasoning is in-line. This is a result of a discussion two Mondays back> > at the geotools IRC meeting. Please include any/all comments inline.> > > > 1) Move Class src/org/geotools/data/DataTestCase.java -->> > test/org/geotools/data/DataTestCase.java> > > > Has no apparent effect.> I think I'd echo James's concerns. I'm not sure if the other modules do > depend on DataTestCase, but I feel like they should - we really should > move forward on Jody's idea for a common testing framework for all > DataStores. Indeed I think perhaps we should expand DataTestCase. I've > spent a lot of time fixing up the same problems in different modules, it > would be great if we could start to automate those things in common. But > it probably should move into a subpackage - data/utils or data/testing or > something.> I'd understand where you are coming from ... also though that the 'test'directories were around for all of maven, just not included in the jars,but could be wrong.> > 2) Delete Method DataStoreFactorySpi.createMetadata( Map params ) throws> > IOException; DataSourceMetadataEnity> > Delete Class DataSourceMetadataEnity> > > > This will involve removing code from each DataStore implementation. I do> > not believe this class is used by any client code (Jody used this as a> > hack for uDig ... it is not used anymore).> Fine by me, I don't use this at all. > > > 3) Move Interface Repository --> /plugin/validation/src/..> > Move Class DefaultRepository --> /plugin/validation/src/..> > Move Class ExperimentalRepostiroy --> /plugin/validation/src/..> > Move Class FeatureSourceRepository --> /plugin/validation/src/..> > Move Class TypeRef --> /plugin/validation/src/..> > > > Affects: Compilation errors for validation> > > > This appears to be the only user (maybe geoserver chris?) ... but I> > think this classes needs some TLC before they are ready for main. > GeoServer does use this I believe. But I use the validation jar so it > should be fine. I agree that it should move out of main, but I agree with > James that it should move to a shared space where we can hack away, that > validation would maybe depend on. I agree it needs to be redone, it needs > to handle grids as well. I think this may be the first task I'd like to > get done for GeoServer, base its core data access on a new Repository type > construct. That said, I'm also fine with just rolling these classes into > GeoServer, to tide things over until the changes are complete. > The experimental portion is a good idea, it would allow us to have asimpler time figuring this all out next time (ie, it would already beseperate).> > > 4) Move Class BatchValidator --> test/.../BatchValidator> I've never used this before, fine by me. > > > > > 5) Delete Interface FeatureView> > Delete Class DefaultFeatureView> > > > Although I think we want to go in this direction .. we are not there yet> > so should not be in main ... maybe in a spike.> Agreed, perhaps same place as the repository stuff?> > > 6) Delete Interface Extent> > Delete Package org.geotools.expr (src + test)> > > > This is experimental ... from the package.html file "Provides an> > Extended Expression/Filter syntax supporting experimentation with> > DataStore JOIN and FeatureType schema modification".> Fine by me. > > > 7) Rename or Delete Class DataSourceException> > > > We don't have any DataSources.> > > > Affects: Many compilation issues.> Let's start with deprecation and make a DataStoreException that we can > start putting into our code.Sure, works for me.> > > 8) Delete Classes/Interfaces FileDataStore*> > > > I wrote these to facilitate the DirectoryDataStore ... which aught to be> > reworked possibly with a new factory using TypeEntry. > > > > This is a significant effort.> I've never used this before, fine by me. > > > > > 9) Clarify the DataStoreFactorySpi.createNewDataStore(Map) method> > > > I can see two meaning ... new File created or new instance (as opposed> > to a singleton based on the param collection)> Yeah, we've had a lot of trouble with new datastore creation, what it > actually means. Perhaps we should just support it for files, since it > really doesn't mean all that much for databases, as we are not going to > create whole new dbs for people, with most dbs that's pretty impossible to > do programmatically. This was mostly about javadocs for me, so think we just need to bothclarify and include the set of 'general practices'.> > > 10) Add "throws IOException" to DataStoreFactorySpi.canProcess(Map)> > > > Often this requires some real checks ...> Sure.> > > > > 11) Remove Method DataStore.entries() > > Remove Method DataStore.search(QueryRequest)> > > > Left over from Catalog > > > > No affects.> Cool.> > > 12) Add Namespace Support to DataStore> > > > FeatureType already support this, so lets add it with these changes:> > > > Add Method String[] getTypeNames( URI namespace );> > Add Method FeatureType getSchema( URI namespace, String typeName) throws> > IOException;> > Deprecate Method FeatureType getSchema(String typeName) throws> > IOException;> This sounds ok to me. But this also brings up for me the point about our > current DataStore API being backwards incompatible. Right now datastores > from 2.1 do not work at all with a 2.0 main. This is bad, and really > should mean that we go to GeoTools 3.0. Which is silly as the changes are > minor. I'd like to see if we might be able to reverse this direction. > Though there may be a few more fixes we need. Our last, much more > significant, data change, from sources to stores, was actually a lot nicer > on users, since they could continue to use DataSources. The DataStores > just were better and slowly got even better, so it made sense for people > to switch. This was before we even had version numbers. I'm not sure how > possible/easy this might be, but it really would be great if geoserver > 1.2.x users could make use of DataStores from the 2.1.x branch, since most > will not want to upgrade until GeoServer hits 1.3.0, which will probably > take awhile, to iron out all the bugs. Even if some sort of additional > 2.1.x jar was needed, for the additional functionality, but right now > things are incompatible, so they would not even work together. > > The reason I bring this up here is that already the getNameSpace in > FeatureType (in the Feature package, even more core than data), does not > work with 2.0.x. It changed from a String to a URI. This causes > compilation and class not found errors across the board. So yeah, I'd > like a real discussion about our backwards compatibility feelings. I > brought this up in my last big email. And we have still not talked about > it. I am actually sort of fine with it if we all agree, but I would hate > to see it just slide against our will. I don't think it would be that > hard, just have getNameSpaceURI, and deprecate getNameSpace? Or have the > Strings returned all be of URI form, and should be able to be passed to a > URI constructor. This is what I do with GeoServer, I always use URI's for > the namespace, but they are just stored as strings. I agree that we > should have originally maybe used URI's, but we didn't. So my personal > vote is that we just live with it (maybe even having your new proposed > methods take namespace strings), since we do have a set API that we should > be more hesitant to change than we are right now. There is a large difference in using a string and a URI programitically... mainly you get exceptions when building URI's (URI has a spec goingwith it), and if we are returning them, then we ought to return the realobject. I do see your backwards compitability issue ... perhaps adeprecation for now, and 2.2.x they get removed?> > > > 13) Remove Access to Low-level API in DataStore> > > > I do not think this type of access is appropriate. This is my optioning> > though. For the time being, I would deprecate the following methods.> > > > FeatureReader getFeatureReader( Query query, Transaction transaction )> > throws IOException;> > FeatureWriter getFeatureWriter(String typeName, Filter filter,> > Transaction transaction) throws IOException;> > FeatureWriter getFeatureWriter(String typeName, Transaction transaction)> > throws IOException;> > FeatureWriter getFeatureWriterAppend(String typeName, Transaction> > transaction) throws IOException;> I'm fine with FeatureWriter going, that's always caused me more > confusion than it's worth. I'm a bit more hesitant to see FeatureReader > go, more or less because it was originally planned to make Joins possible, > and I'm finally at the point where I want to work on Joins, or at least > start down that path. So I'd prefer that you humor me and keep it around > a bit, until we come up with a coherent plan for joins, and a way to > simplify our > featureresults/featurecollection/featureiterator/featurereader mess, as > they all do very similar things. And I'm still unclear exactly why you're > so against it. I know I've asked this before, but could you explain in > detail what was so hard about implementing it in the GML DataStore? I > mean, even if you remove the method from datastore you're still going to > have to implement it, with > DataStore.getFeatureSource(query.getTypeName()).getFeatureResults(Query).getFeatureReader(), > right? It seems to me like having it in DataStore is just a convenience > method for that. Or do you have plans to do away with all FeatureReaders? > I mean, we're going to have to provide a way to read features, no? For me > the only issue seems to be that it's maybe not so obvious that you could > implement the DataStore.getFeatureReader in that way, which is sort of a > non-issue in my mind, as our abstract classes handle all of that. This > all said, I can see the point of mixing high and low level apis. But I > think there is something to be said for knowing exactly what you want out > of a datastore and being able to get it. But I'm fine with waiting to see > how joins and all play out. > > > > These are some of the changes I would like to see the data access api> > start to take. I am obviously looking for your support in moving this> > api forward, but most importantly I am looking for feedback as to why> > this may be good/bad, and also how this might affect you projects.> > Please provide your feedback in-line (makes it easier to read).> So overall, for the most part good, and you have my support. I would like > backwards compatibility, but I'm not wedded to it. It does seem like we > could do some elegant stuff to ensure that, new classes where approriate > (in other words no datastore extending catalog or discovery or any of > that). I will be able to join you working on this relatively soon, I > started rolling forward my 2.0.x fixes to trunk. Many questions remain in > my mind, how we actually do joins, how we handle a data repository, how we > get 'view's (which could include joined views, or renamed views), doing > discovery right, and more. But these steps for the most part are all in > the right directions in my mind. Though in many ways I feel that the data > api has been sliding since I started working on 2.0.x, this appears to get > it back to a good point. To move it forward we really need many minds > working together on this...> The main reason to remove the lowlevel api from datastore is that itdoes not support transactions, which can cause problems when keepingtrack of open readers/writers w.r.t. transactions. The readers + writerswould still be accessable from the FeatureSource/Store apis.> from Jesse:> > I haven't looked really closely but most of the suggestions make > sense... to > > me at any rate. I am a little concerned about all the metadata and > > discovery code being removed though. I can completely understand the > desire > > to separate concerns but as a writer of a client I need some mechanism > for > > discovering what data a datastore has available.> > > > One idea is data stores could have a getMetadataSource() method and > there > > could be a Discovery class that could take a datastore and use the > > MetadataSource to provide discovery functionality.> Ok, I'm the one with the big push to get rid of all the metadata and > discovery stuff. But I do think there is a place for it indeed. And I > think I can probably help with figuring that out, if you let me know > exactly what you're requirements are. Try to re-read my long email about > killing catalog so you see where I'm coming from. > > The first thing we need to do to move forward is to square away exactly > what we mean by 'metadata'. I'd like to propose that we reserver the term > 'metadata', solely for things that are directly related to the classes > (and concepts) contained in org.geotools.metadata. Which is to say truly > data about the data itself, when it was collected, its accuracy, the > contact information on who collected the data, the distribution data, ect. > Basically something that could be represented as a document that you would > physically look at and read to figure out if you would actually want to > use the data. Then we also have the connection parameters, which were > squeezed into MetaData entries or entities or some such, which I hope are > gone. I think we should come up with a new name for these - > ConnectionParams? DataStoreParams? DSParamMetaData? Then we also have a > FeatureType, which can be argued is a form of MetaData as well, it is data > about data. This we should just refer to as the feature type. > > So I think what your asking for in 'some mechanism for discovering what > data a datastore has available' that you mean to search the metadata (real > metadata, of my first definition). Is that right? If so then that is > what the Catalog and/or Discovery construct should be used for. But it > should be very loosely coupled with the actual methods of getting the > data. DataStore should definitely not implement Catalog or Discovery. I > feel the method(s) for associating each DataStore/FeatureSource with > metadata should take place within that catalog construct. If it is a file > like Shapefile, that can have some additional metadata information, when > we should have something like a readMetaData() method that can operate on > the same connection parameters as the DataStore itself, and would read any > additional files/information. In time we can maybe evaluate having this > more closely linked, but given the history of this catalog stuff sliding > into the DataStore I'd like to keep it seperate. Especially because many, > many users of the geotools toolkit won't care about such things at all. I > almost feel it should go in a seperate module. But yes, I admit that I > don't completely understand your requirements, and what you want the > MetadataSource to do. But I feel we probably can find a way to keep it > very decoupled from the actual DataStore, and I feel that's rarely a very > bad thing to do.The MetaDataSource thingy was a bit my idea ... it was mainly toseperate the provider of data from the interpreter ... will explain morein the meeting if you want.David> > best regards,> > Chris> >