Exercise 1
需要完成的代码有:
- src/java/simpledb/storage/TupleDesc.java
- src/java/simpledb/storage/Tuple.java
Tuple是simpleDB的元组,由多个Field(字段)组成,TupleDesc负责描述Tuple中各个Field对应的schema。
Tuple.java代码:
package simpledb.storage;
import java.io.Serializable;
import java.util.Arrays;
import java.util.Iterator;
import java.util.concurrent.CopyOnWriteArrayList;
/**
* Tuple maintains information about the contents of a tuple. Tuples have a
* specified schema specified by a TupleDesc object and contain Field objects
* with the data for each field.
*/
public class Tuple implements Serializable {
private static final long serialVersionUID = 1L;
TupleDesc td; // TupleDesc describes the schema of a tuple.
RecordId rid; // RecordId is a reference to a specific tuple on a specific page of a specific table.
CopyOnWriteArrayList<Field> fields; // Field objects with the data for each field.
/**
* Create a new tuple with the specified schema (type).
*
* @param td the schema of this tuple. It must be a valid TupleDesc
* instance with at least one field.
*/
public Tuple(TupleDesc td) {
// TODO: some code goes here
this.td = td;
this.fields = new CopyOnWriteArrayList<>();
}
/**
* @return The TupleDesc representing the schema of this tuple.
*/
public TupleDesc getTupleDesc() {
// TODO: some code goes here
return td;
}
/**
* @return The RecordId representing the location of this tuple on disk. May
* be null.
*/
public RecordId getRecordId() {
// TODO: some code goes here
return rid;
}
/**
* Set the RecordId information for this tuple.
*
* @param rid the new RecordId for this tuple.
*/
public void setRecordId(RecordId rid) {
// TODO: some code goes here
this.rid = rid;
}
/**
* Change the value of the ith field of this tuple.
*
* @param i index of the field to change. It must be a valid index.
* @param f new value for the field.
*/
public void setField(int i, Field f) {
// TODO: some code goes here
if (i >= 0 && i < fields.size()) {
fields.set(i, f);
} else if (i == fields.size()) {
fields.add(f);
} else {
throw new IllegalArgumentException("Invalid field index");
}
}
/**
* @param i field index to return. Must be a valid index.
* @return the value of the ith field, or null if it has not been set.
*/
public Field getField(int i) {
// TODO: some code goes here
if (fields == null || i < 0 || i >= fields.size())
return null;
return fields.get(i);
}
/**
* Returns the contents of this Tuple as a string. Note that to pass the
* system tests, the format needs to be as follows:
* <p>
* column1\tcolumn2\tcolumn3\t...\tcolumnN
* <p>
* where \t is any whitespace (except a newline)
*/
public String toString() {
// TODO: some code goes here
StringBuilder sb = new StringBuilder();
Iterator<TupleDesc.TDItem> it = td.iterator();
while (it.hasNext()) {
sb.append(it.next().fieldName);
if (it.hasNext()) {
sb.append("\t");
}
}
return sb.toString();
}
/**
* @return An iterator which iterates over all the fields of this tuple
*/
public Iterator<Field> fields() {
// TODO: some code goes here
return fields.iterator();
}
/**
* reset the TupleDesc of this tuple (only affecting the TupleDesc)
*/
public void resetTupleDesc(TupleDesc td) {
// TODO: some code goes here
this.td = td;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Tuple)) return false;
Tuple tuple = (Tuple) o;
return td.equals(tuple.td) &&
rid.equals(tuple.rid) &&
fields.equals(tuple.fields);
}
}
这里fields使用concurrent包的CopyOnWriteArrayList存储,保证线程安全。
TupleDesc.java代码:
package simpledb.storage;
import simpledb.common.Type;
import java.io.Serializable;
import java.util.Arrays;
import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.concurrent.CopyOnWriteArrayList;
/**
* TupleDesc describes the schema of a tuple.
*/
public class TupleDesc implements Serializable {
CopyOnWriteArrayList<TDItem> tdItems;
/**
* A help class to facilitate organizing the information of each field
*/
public static class TDItem implements Serializable {
private static final long serialVersionUID = 1L;
/**
* The type of the field
*/
public final Type fieldType;
/**
* The name of the field
*/
public final String fieldName;
public TDItem(Type t, String n) {
this.fieldName = n;
this.fieldType = t;
}
public String toString() {
return fieldName + "(" + fieldType + ")";
}
}
/**
* @return An iterator which iterates over all the field TDItems
* that are included in this TupleDesc
*/
public Iterator<TDItem> iterator() {
// TODO: some code goes here
if (tdItems != null)
return tdItems.iterator();
return null;
}
private static final long serialVersionUID = 1L;
/**
* Create a new TupleDesc with typeAr.length fields with fields of the
* specified types, with associated named fields.
*
* @param typeAr array specifying the number of and types of fields in this
* TupleDesc. It must contain at least one entry.
* @param fieldAr array specifying the names of the fields. Note that names may
* be null.
*/
public TupleDesc(Type[] typeAr, String[] fieldAr) {
// TODO: some code goes here
tdItems = new CopyOnWriteArrayList<>();
for (int i = 0; i < typeAr.length; i++) {
tdItems.add(new TDItem(typeAr[i], fieldAr[i]));
}
}
/**
* Constructor. Create a new tuple desc with typeAr.length fields with
* fields of the specified types, with anonymous (unnamed) fields.
*
* @param typeAr array specifying the number of and types of fields in this
* TupleDesc. It must contain at least one entry.
*/
public TupleDesc(Type[] typeAr) {
// TODO: some code goes here
tdItems = new CopyOnWriteArrayList<>();
for (int i = 0; i < typeAr.length; i++) {
tdItems.add(new TDItem(typeAr[i], null));
}
}
public TupleDesc() {
tdItems = new CopyOnWriteArrayList<>();
}
/**
* @return the number of fields in this TupleDesc
*/
public int numFields() {
// TODO: some code goes here
return tdItems.size();
}
/**
* Gets the (possibly null) field name of the ith field of this TupleDesc.
*
* @param i index of the field name to return. It must be a valid index.
* @return the name of the ith field
* @throws NoSuchElementException if i is not a valid field reference.
*/
public String getFieldName(int i) throws NoSuchElementException {
// TODO: some code goes here
return tdItems.get(i).fieldName;
}
/**
* Gets the type of the ith field of this TupleDesc.
*
* @param i The index of the field to get the type of. It must be a valid
* index.
* @return the type of the ith field
* @throws NoSuchElementException if i is not a valid field reference.
*/
public Type getFieldType(int i) throws NoSuchElementException {
// TODO: some code goes here
return tdItems.get(i).fieldType;
}
/**
* Find the index of the field with a given name.
*
* @param name name of the field.
* @return the index of the field that is first to have the given name.
* @throws NoSuchElementException if no field with a matching name is found.
*/
public int indexForFieldName(String name) throws NoSuchElementException {
// TODO: some code goes here
if (name == null) {
throw new NoSuchElementException("No filed with a matching name");
}
String altName = name.substring(name.lastIndexOf(".") + 1);
for (int i = 0; i < tdItems.size(); ++i) {
if (name.equals(getFieldName(i)) || altName.equals(getFieldName(i))) {
return i;
}
}
throw new NoSuchElementException("No filed with a matching name");
}
/**
* @return The size (in bytes) of tuples corresponding to this TupleDesc.
* Note that tuples from a given TupleDesc are of a fixed size.
*/
public int getSize() {
// TODO: some code goes here
int size = 0;
for (TDItem it : tdItems) {
size += it.fieldType.getLen();
}
return size;
}
/**
* Merge two TupleDescs into one, with td1.numFields + td2.numFields fields,
* with the first td1.numFields coming from td1 and the remaining from td2.
*
* @param td1 The TupleDesc with the first fields of the new TupleDesc
* @param td2 The TupleDesc with the last fields of the TupleDesc
* @return the new TupleDesc
*/
public static TupleDesc merge(TupleDesc td1, TupleDesc td2) {
// TODO: some code goes here
if (td1 == null) {
return td2;
}
if (td2 == null) {
return td1;
}
TupleDesc tupleDesc = new TupleDesc();
for (int i = 0; i < td1.numFields(); ++i) {
tupleDesc.tdItems.add(td1.tdItems.get(i));
}
for (int i = 0; i < td2.numFields(); ++i) {
tupleDesc.tdItems.add(td2.tdItems.get(i));
}
return tupleDesc;
}
/**
* Compares the specified object with this TupleDesc for equality. Two
* TupleDescs are considered equal if they have the same number of items
* and if the i-th type in this TupleDesc is equal to the i-th type in o
* for every i.
*
* @param o the Object to be compared for equality with this TupleDesc.
* @return true if the object is equal to this TupleDesc.
*/
public boolean equals(Object o) {
// TODO: some code goes here
if (o == null || !(o instanceof TupleDesc)) {
return false;
}
TupleDesc tupleDesc = (TupleDesc) o;
if (tupleDesc.numFields() != numFields() || tupleDesc.getSize() != getSize()) {
return false;
}
for (int i = 0; i < numFields(); ++i) {
if (!tupleDesc.getFieldType(i).equals(getFieldType(i))) {
return false;
}
}
return true;
}
public int hashCode() {
// If you want to use TupleDesc as keys for HashMap, implement this so
// that equal objects have equals hashCode() results
int res = 0;
for (TDItem it : tdItems) {
res += it.toString().hashCode() * 41;
}
return res;
}
/**
* Returns a String describing this descriptor. It should be of the form
* "fieldType[0](fieldName[0]), ..., fieldType[M](fieldName[M])", although
* the exact format does not matter.
*
* @return String describing this descriptor.
*/
public String toString() {
// TODO: some code goes here
StringBuilder sb = new StringBuilder();
for (int i = 0; i < tdItems.size(); i++) {
TDItem it = tdItems.get(i);
sb.append(it.fieldType.toString()).append("(").append(it.fieldName).append("), ");
}
sb.deleteCharAt(sb.length() - 2);
return sb.toString();
}
}
对Exercise 1执行单元测试:
ant runtest -Dtest=TupleTest
ant runtest -Dtest=TupleDescTest
这里的单元测试正常是不能通过的,错误项应该只有modifyRecordId()
,因为还没有实现。
Exercise 2
需要完成的是:
- src/java/simpledb/common/Catalog.java
Catalog类表示目录,也是数据库表的入口,存储各个表的表名和ID。
Catalog.java代码:
package simpledb.common;
import simpledb.storage.DbFile;
import simpledb.storage.HeapFile;
import simpledb.storage.TupleDesc;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
/**
* The Catalog keeps track of all available tables in the database and their
* associated schemas.
* For now, this is a stub catalog that must be populated with tables by a
* user program before it can be used -- eventually, this should be converted
* to a catalog that reads a catalog table from disk.
*
* @Threadsafe
*/
public class Catalog {
ConcurrentHashMap<Integer, Table> tableIdToTable;
ConcurrentHashMap<String, Integer> tableNameToTableId;
/**
* Constructor.
* Creates a new, empty catalog.
*/
public Catalog() {
// TODO: some code goes here
tableIdToTable = new ConcurrentHashMap<>();
tableNameToTableId = new ConcurrentHashMap<>();
}
/**
* Add a new table to the catalog.
* This table's contents are stored in the specified DbFile.
*
* @param file the contents of the table to add; file.getId() is the identfier of
* this file/tupledesc param for the calls getTupleDesc and getFile
* @param name the name of the table -- may be an empty string. May not be null. If a name
* conflict exists, use the last table to be added as the table for a given name.
* @param pkeyField the name of the primary key field
*/
public void addTable(DbFile file, String name, String pkeyField) {
// TODO: some code goes here
tableIdToTable.put(file.getId(), new Table(file, name, pkeyField));
tableNameToTableId.put(name, file.getId());
}
public void addTable(DbFile file, String name) {
addTable(file, name, "");
}
/**
* Add a new table to the catalog.
* This table has tuples formatted using the specified TupleDesc and its
* contents are stored in the specified DbFile.
*
* @param file the contents of the table to add; file.getId() is the identfier of
* this file/tupledesc param for the calls getTupleDesc and getFile
*/
public void addTable(DbFile file) {
addTable(file, (UUID.randomUUID()).toString());
}
/**
* Return the id of the table with a specified name,
*
* @throws NoSuchElementException if the table doesn't exist
*/
public int getTableId(String name) throws NoSuchElementException {
// TODO: some code goes here
if (name != null && tableNameToTableId.containsKey(name)) {
return tableNameToTableId.get(name);
}
throw new NoSuchElementException("Table doesn't exist");
}
/**
* Returns the tuple descriptor (schema) of the specified table
*
* @param tableid The id of the table, as specified by the DbFile.getId()
* function passed to addTable
* @throws NoSuchElementException if the table doesn't exist
*/
public TupleDesc getTupleDesc(int tableid) throws NoSuchElementException {
// TODO: some code goes here
if (tableIdToTable.containsKey(tableid)) {
return tableIdToTable.get(tableid).getFile().getTupleDesc();
}
throw new NoSuchElementException("Table doesn't exist");
}
/**
* Returns the DbFile that can be used to read the contents of the
* specified table.
*
* @param tableid The id of the table, as specified by the DbFile.getId()
* function passed to addTable
*/
public DbFile getDatabaseFile(int tableid) throws NoSuchElementException {
// TODO: some code goes here
if (tableIdToTable.containsKey(tableid)) {
return tableIdToTable.get(tableid).getFile();
}
throw new NoSuchElementException("Table doesn't exist");
}
public String getPrimaryKey(int tableid) {
// TODO: some code goes here
if (tableIdToTable.containsKey(tableid)) {
return tableIdToTable.get(tableid).getPkeyField();
}
throw new NoSuchElementException("Table doesn't exist");
}
public Iterator<Integer> tableIdIterator() {
// TODO: some code goes here
return tableIdToTable.keySet().iterator();
}
public String getTableName(int id) {
// TODO: some code goes here
if (tableIdToTable.containsKey(id)) {
return tableIdToTable.get(id).getName();
}
throw new NoSuchElementException("Table doesn't exist");
}
/**
* Delete all tables from the catalog
*/
public void clear() {
// TODO: some code goes here
tableIdToTable.clear();
tableNameToTableId.clear();
}
/**
* Reads the schema from a file and creates the appropriate tables in the database.
*
* @param catalogFile
*/
public void loadSchema(String catalogFile) {
String line = "";
String baseFolder = new File(new File(catalogFile).getAbsolutePath()).getParent();
try {
BufferedReader br = new BufferedReader(new FileReader(catalogFile));
while ((line = br.readLine()) != null) {
//assume line is of the format name (field type, field type, ...)
String name = line.substring(0, line.indexOf("(")).trim();
//System.out.println("TABLE NAME: " + name);
String fields = line.substring(line.indexOf("(") + 1, line.indexOf(")")).trim();
String[] els = fields.split(",");
List<String> names = new ArrayList<>();
List<Type> types = new ArrayList<>();
String primaryKey = "";
for (String e : els) {
String[] els2 = e.trim().split(" ");
names.add(els2[0].trim());
if (els2[1].trim().equalsIgnoreCase("int"))
types.add(Type.INT_TYPE);
else if (els2[1].trim().equalsIgnoreCase("string"))
types.add(Type.STRING_TYPE);
else {
System.out.println("Unknown type " + els2[1]);
System.exit(0);
}
if (els2.length == 3) {
if (els2[2].trim().equals("pk"))
primaryKey = els2[0].trim();
else {
System.out.println("Unknown annotation " + els2[2]);
System.exit(0);
}
}
}
Type[] typeAr = types.toArray(new Type[0]);
String[] namesAr = names.toArray(new String[0]);
TupleDesc t = new TupleDesc(typeAr, namesAr);
HeapFile tabHf = new HeapFile(new File(baseFolder + "/" + name + ".dat"), t);
addTable(tabHf, name, primaryKey);
System.out.println("Added table : " + name + " with schema " + t);
}
} catch (IOException e) {
e.printStackTrace();
System.exit(0);
} catch (IndexOutOfBoundsException e) {
System.out.println("Invalid catalog entry : " + line);
System.exit(0);
}
}
}
class Table {
private DbFile file;
private String name;
private String pkeyField;
public Table(DbFile file, String name, String pkeyField) {
this.file = file;
this.name = name;
this.pkeyField = pkeyField;
}
public Table(DbFile file, String name) {
this(file, name, "");
}
public DbFile getFile() {
return file;
}
public String getName() {
return name;
}
public String getPkeyField() {
return pkeyField;
}
}
这里额外实现了一个Table类,也就是一张表,存储了表的内容、名字和主键信息。
此外,用了两个ConcurrentHashMap,一个保存ID和表的映射关系,一个保存表名和ID的映射关系,便于查询。
对Exercise 2执行单元测试:
ant runtest -Dtest=CatalogTest
这里的单元测试应该是successful的。
Exercise 3
需要完成的是:
- src/java/simpledb/storage/BufferPool.java
BufferPool是为了避免频繁读写硬盘,讲一部分数据读出后缓存在内存中,这部分数据是以页为单位,若干的页组成了BufferPool。
BufferPool代码:
package simpledb.storage;
import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.DeadlockException;
import simpledb.common.Permissions;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;
import java.io.IOException;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
/**
* BufferPool manages the reading and writing of pages into memory from
* disk. Access methods call into it to retrieve pages, and it fetches
* pages from the appropriate location.
* <p>
* The BufferPool is also responsible for locking; when a transaction fetches
* a page, BufferPool checks that the transaction has the appropriate
* locks to read/write the page.
*
* @Threadsafe, all fields are final
*/
public class BufferPool {
/**
* Bytes per page, including header.
*/
private static final int DEFAULT_PAGE_SIZE = 4096;
private static int pageSize = DEFAULT_PAGE_SIZE;
/**
* Default number of pages passed to the constructor. This is used by
* other classes. BufferPool should use the numPages argument to the
* constructor instead.
*/
public static final int DEFAULT_PAGES = 50;
private final int numPages;
private final Map<PageId, Page> bufferPool = new ConcurrentHashMap<>();
/**
* Creates a BufferPool that caches up to numPages pages.
*
* @param numPages maximum number of pages in this buffer pool.
*/
public BufferPool(int numPages) {
// TODO: some code goes here
this.numPages = numPages;
}
public static int getPageSize() {
return pageSize;
}
// THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
public static void setPageSize(int pageSize) {
BufferPool.pageSize = pageSize;
}
// THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
public static void resetPageSize() {
BufferPool.pageSize = DEFAULT_PAGE_SIZE;
}
/**
* Retrieve the specified page with the associated permissions.
* Will acquire a lock and may block if that lock is held by another
* transaction.
* <p>
* The retrieved page should be looked up in the buffer pool. If it
* is present, it should be returned. If it is not present, it should
* be added to the buffer pool and returned. If there is insufficient
* space in the buffer pool, a page should be evicted and the new page
* should be added in its place.
*
* @param tid the ID of the transaction requesting the page
* @param pid the ID of the requested page
* @param perm the requested permissions on the page
*/
public Page getPage(TransactionId tid, PageId pid, Permissions perm)
throws TransactionAbortedException, DbException {
// TODO: some code goes here
if (!bufferPool.containsKey(pid)) {
DbFile dbFile = Database.getCatalog().getDatabaseFile(pid.getTableId());
Page page = dbFile.readPage(pid);
bufferPool.put(pid, page);
}
return bufferPool.get(pid);
}
/**
* Releases the lock on a page.
* Calling this is very risky, and may result in wrong behavior. Think hard
* about who needs to call this and why, and why they can run the risk of
* calling it.
*
* @param tid the ID of the transaction requesting the unlock
* @param pid the ID of the page to unlock
*/
public void unsafeReleasePage(TransactionId tid, PageId pid) {
// TODO: some code goes here
// not necessary for lab1|lab2
}
/**
* Release all locks associated with a given transaction.
*
* @param tid the ID of the transaction requesting the unlock
*/
public void transactionComplete(TransactionId tid) {
// TODO: some code goes here
// not necessary for lab1|lab2
}
/**
* Return true if the specified transaction has a lock on the specified page
*/
public boolean holdsLock(TransactionId tid, PageId p) {
// TODO: some code goes here
// not necessary for lab1|lab2
return false;
}
/**
* Commit or abort a given transaction; release all locks associated to
* the transaction.
*
* @param tid the ID of the transaction requesting the unlock
* @param commit a flag indicating whether we should commit or abort
*/
public void transactionComplete(TransactionId tid, boolean commit) {
// TODO: some code goes here
// not necessary for lab1|lab2
}
/**
* Add a tuple to the specified table on behalf of transaction tid. Will
* acquire a write lock on the page the tuple is added to and any other
* pages that are updated (Lock acquisition is not needed for lab2).
* May block if the lock(s) cannot be acquired.
* <p>
* Marks any pages that were dirtied by the operation as dirty by calling
* their markDirty bit, and adds versions of any pages that have
* been dirtied to the cache (replacing any existing versions of those pages) so
* that future requests see up-to-date pages.
*
* @param tid the transaction adding the tuple
* @param tableId the table to add the tuple to
* @param t the tuple to add
*/
public void insertTuple(TransactionId tid, int tableId, Tuple t)
throws DbException, IOException, TransactionAbortedException {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Remove the specified tuple from the buffer pool.
* Will acquire a write lock on the page the tuple is removed from and any
* other pages that are updated. May block if the lock(s) cannot be acquired.
* <p>
* Marks any pages that were dirtied by the operation as dirty by calling
* their markDirty bit, and adds versions of any pages that have
* been dirtied to the cache (replacing any existing versions of those pages) so
* that future requests see up-to-date pages.
*
* @param tid the transaction deleting the tuple.
* @param t the tuple to delete
*/
public void deleteTuple(TransactionId tid, Tuple t)
throws DbException, IOException, TransactionAbortedException {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Flush all dirty pages to disk.
* NB: Be careful using this routine -- it writes dirty data to disk so will
* break simpledb if running in NO STEAL mode.
*/
public synchronized void flushAllPages() throws IOException {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Remove the specific page id from the buffer pool.
* Needed by the recovery manager to ensure that the
* buffer pool doesn't keep a rolled back page in its
* cache.
* <p>
* Also used by B+ tree files to ensure that deleted pages
* are removed from the cache so they can be reused safely
*/
public synchronized void removePage(PageId pid) {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Flushes a certain page to disk
*
* @param pid an ID indicating the page to flush
*/
private synchronized void flushPage(PageId pid) throws IOException {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Write all pages of the specified transaction to disk.
*/
public synchronized void flushPages(TransactionId tid) throws IOException {
// TODO: some code goes here
// not necessary for lab1|lab2
}
/**
* Discards a page from the buffer pool.
* Flushes the page to disk to ensure dirty pages are updated on disk.
*/
private synchronized void evictPage() throws DbException {
// TODO: some code goes here
// not necessary for lab1
}
}
大部分方法不在lab1中实现,因此需要写的比较少,也没有单元测试,在下一个Exercise中一并测试。此外,getPage暂时还没考虑Pool满的情况。
Exercise 4
需要完成的是:
- src/java/simpledb/storage/HeapPageId.java
- src/java/simpledb/storage/RecordId.java
- src/java/simpledb/storage/HeapPage.java
数据库常见的访问方式有Heap files
和B-trees
,simpleDB
使用了Heap files
。
HeapPage 是数据库中表的数据页,通常用于存储表中的多个记录(或称为行)。它是数据在磁盘上的物理存储单位之一,负责管理表中的数据。每个 HeapPage 包含了一定数量的记录,以及一些元数据用于管理和访问这些记录。
HeapPageId 是用于唯一标识一个 HeapPage 的标识符。它通常包含两个部分:表的标识符(TableId)和页的序号(pageNumber)。通过 HeapPageId,我们可以准确定位到数据库中的某个特定的数据页。
RecordId 则是用于唯一标识一个记录(或行)的标识符。它包含了一个 HeapPageId,用于指示记录所在的数据页,以及记录在该数据页中的偏移量(tupleNo)。通过 RecordId,我们可以唯一地标识和定位数据库中的某个记录。
每个表有一个HeapFile
,一个HeapFile
包含一组HeapPage
,每个HeapPage
由BufferPool.DEFAULT_PAGE_SIZE
个字节组成,并被划分为一组槽位,每个槽位可以放一个元组。此外,每个HeapPage
还会包含一个位图作为头,每一位标识每个槽位,为1则元组有效,否则为0(例如不存在或者被删除)。page存储在buffer pool中,但是由HeapFile
来读写。
因此,假设每个元组的大小为tuple size
个字节,那么加上位图中的一位,在page中就需要tuple size * 8 + 1
的空间。因此每个page中的元组数量为(floor向下取整):
tuples per page = floor((page size * 8) / (tuple size * 8 + 1))
求出元组数量,就可以获得头部位图的字节数(ceiling向上取整):
header bytes = ceiling(tuples per page / 8)
位图中每个字节的低位表示靠前的槽位,所以如果槽位的数量不是8的倍数,最后一个字节的可能有几个高位是无意义的。
所有JVM都是大端字节序
HeapPageId.java代码:
package simpledb.storage;
import java.util.Objects;
/**
* Unique identifier for HeapPage objects.
*/
public class HeapPageId implements PageId {
private int tableId;
private int pgNo;
/**
* Constructor. Create a page id structure for a specific page of a
* specific table.
*
* @param tableId The table that is being referenced
* @param pgNo The page number in that table.
*/
public HeapPageId(int tableId, int pgNo) {
// TODO: some code goes here
this.tableId = tableId;
this.pgNo = pgNo;
}
/**
* @return the table associated with this PageId
*/
public int getTableId() {
// TODO: some code goes here
return tableId;
}
/**
* @return the page number in the table getTableId() associated with
* this PageId
*/
public int getPageNumber() {
// TODO: some code goes here
return pgNo;
}
/**
* @return a hash code for this page, represented by a combination of
* the table number and the page number (needed if a PageId is used as a
* key in a hash table in the BufferPool, for example.)
* @see BufferPool
*/
public int hashCode() {
// TODO: some code goes here
return Objects.hash(getPageNumber(), getTableId());
}
/**
* Compares one PageId to another.
*
* @param o The object to compare against (must be a PageId)
* @return true if the objects are equal (e.g., page numbers and table
* ids are the same)
*/
public boolean equals(Object o) {
// TODO: some code goes here
if (!(o instanceof HeapPageId))
return false;
HeapPageId other = (HeapPageId) o;
if (this.getPageNumber() == other.getPageNumber() && this.getTableId() == other.getTableId())
return true;
return false;
}
/**
* Return a representation of this object as an array of
* integers, for writing to disk. Size of returned array must contain
* number of integers that corresponds to number of args to one of the
* constructors.
*/
public int[] serialize() {
int[] data = new int[2];
data[0] = getTableId();
data[1] = getPageNumber();
return data;
}
}
HeapPageId存储一张表的一个具体页面。
HeapPage.java代码:
package simpledb.storage;
import simpledb.common.Catalog;
import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Debug;
import simpledb.transaction.TransactionId;
import java.io.*;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;
/**
* Each instance of HeapPage stores data for one page of HeapFiles and
* implements the Page interface that is used by BufferPool.
*
* @see HeapFile
* @see BufferPool
*/
public class HeapPage implements Page {
final HeapPageId pid;
final TupleDesc td;
final byte[] header;
final Tuple[] tuples;
final int numSlots;
byte[] oldData;
private final Byte oldDataLock = (byte) 0;
/**
* Create a HeapPage from a set of bytes of data read from disk.
* The format of a HeapPage is a set of header bytes indicating
* the slots of the page that are in use, some number of tuple slots.
* Specifically, the number of tuples is equal to: <p>
* floor((BufferPool.getPageSize()*8) / (tuple size * 8 + 1))
* <p> where tuple size is the size of tuples in this
* database table, which can be determined via {@link Catalog#getTupleDesc}.
* The number of 8-bit header words is equal to:
* <p>
* ceiling(no. tuple slots / 8)
* <p>
*
* @see Database#getCatalog
* @see Catalog#getTupleDesc
* @see BufferPool#getPageSize()
*/
public HeapPage(HeapPageId id, byte[] data) throws IOException {
this.pid = id;
this.td = Database.getCatalog().getTupleDesc(id.getTableId());
this.numSlots = getNumTuples();
DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data));
// allocate and read the header slots of this page
header = new byte[getHeaderSize()];
for (int i = 0; i < header.length; i++)
header[i] = dis.readByte();
tuples = new Tuple[numSlots];
try {
// allocate and read the actual records of this page
for (int i = 0; i < tuples.length; i++)
tuples[i] = readNextTuple(dis, i);
} catch (NoSuchElementException e) {
e.printStackTrace();
}
dis.close();
setBeforeImage();
}
/**
* Retrieve the number of tuples on this page.
*
* @return the number of tuples on this page
*/
private int getNumTuples() {
// TODO: some code goes here
return (BufferPool.getPageSize() * 8) / (td.getSize() * 8 + 1);
}
/**
* Computes the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
*
* @return the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
*/
private int getHeaderSize() {
// TODO: some code goes here
return (int) Math.ceil((double) getNumTuples() / 8);
}
/**
* Return a view of this page before it was modified
* -- used by recovery
*/
public HeapPage getBeforeImage() {
try {
byte[] oldDataRef = null;
synchronized (oldDataLock) {
oldDataRef = oldData;
}
return new HeapPage(pid, oldDataRef);
} catch (IOException e) {
e.printStackTrace();
//should never happen -- we parsed it OK before!
System.exit(1);
}
return null;
}
public void setBeforeImage() {
synchronized (oldDataLock) {
oldData = getPageData().clone();
}
}
/**
* @return the PageId associated with this page.
*/
public HeapPageId getId() {
// TODO: some code goes here
return this.pid;
}
/**
* Suck up tuples from the source file.
*/
private Tuple readNextTuple(DataInputStream dis, int slotId) throws NoSuchElementException {
// if associated bit is not set, read forward to the next tuple, and
// return null.
if (!isSlotUsed(slotId)) {
for (int i = 0; i < td.getSize(); i++) {
try {
dis.readByte();
} catch (IOException e) {
throw new NoSuchElementException("error reading empty tuple");
}
}
return null;
}
// read fields in the tuple
Tuple t = new Tuple(td);
RecordId rid = new RecordId(pid, slotId);
t.setRecordId(rid);
try {
for (int j = 0; j < td.numFields(); j++) {
Field f = td.getFieldType(j).parse(dis);
t.setField(j, f);
}
} catch (java.text.ParseException e) {
e.printStackTrace();
throw new NoSuchElementException("parsing error!");
}
return t;
}
/**
* Generates a byte array representing the contents of this page.
* Used to serialize this page to disk.
* <p>
* The invariant here is that it should be possible to pass the byte
* array generated by getPageData to the HeapPage constructor and
* have it produce an identical HeapPage object.
*
* @return A byte array correspond to the bytes of this page.
* @see #HeapPage
*/
public byte[] getPageData() {
int len = BufferPool.getPageSize();
ByteArrayOutputStream baos = new ByteArrayOutputStream(len);
DataOutputStream dos = new DataOutputStream(baos);
// create the header of the page
for (byte b : header) {
try {
dos.writeByte(b);
} catch (IOException e) {
// this really shouldn't happen
e.printStackTrace();
}
}
// create the tuples
for (int i = 0; i < tuples.length; i++) {
// empty slot
if (!isSlotUsed(i)) {
for (int j = 0; j < td.getSize(); j++) {
try {
dos.writeByte(0);
} catch (IOException e) {
e.printStackTrace();
}
}
continue;
}
// non-empty slot
for (int j = 0; j < td.numFields(); j++) {
Field f = tuples[i].getField(j);
try {
f.serialize(dos);
} catch (IOException e) {
e.printStackTrace();
}
}
}
// padding
int zerolen = BufferPool.getPageSize() - (header.length + td.getSize() * tuples.length); //- numSlots * td.getSize();
byte[] zeroes = new byte[zerolen];
try {
dos.write(zeroes, 0, zerolen);
} catch (IOException e) {
e.printStackTrace();
}
try {
dos.flush();
} catch (IOException e) {
e.printStackTrace();
}
return baos.toByteArray();
}
/**
* Static method to generate a byte array corresponding to an empty
* HeapPage.
* Used to add new, empty pages to the file. Passing the results of
* this method to the HeapPage constructor will create a HeapPage with
* no valid tuples in it.
*
* @return The returned ByteArray.
*/
public static byte[] createEmptyPageData() {
int len = BufferPool.getPageSize();
return new byte[len]; //all 0
}
/**
* Delete the specified tuple from the page; the corresponding header bit should be updated to reflect
* that it is no longer stored on any page.
*
* @param t The tuple to delete
* @throws DbException if this tuple is not on this page, or tuple slot is
* already empty.
*/
public void deleteTuple(Tuple t) throws DbException {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Adds the specified tuple to the page; the tuple should be updated to reflect
* that it is now stored on this page.
*
* @param t The tuple to add.
* @throws DbException if the page is full (no empty slots) or tupledesc
* is mismatch.
*/
public void insertTuple(Tuple t) throws DbException {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Marks this page as dirty/not dirty and record that transaction
* that did the dirtying
*/
public void markDirty(boolean dirty, TransactionId tid) {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Returns the tid of the transaction that last dirtied this page, or null if the page is not dirty
*/
public TransactionId isDirty() {
// TODO: some code goes here
// Not necessary for lab1
return null;
}
/**
* Returns the number of unused (i.e., empty) slots on this page.
*/
public int getNumUnusedSlots() {
// TODO: some code goes here
int cnt = 0;
for (int i = 0; i < numSlots; ++i) {
if (!isSlotUsed(i)) {
++cnt;
}
}
return cnt;
}
/**
* Returns true if associated slot on this page is filled.
*/
public boolean isSlotUsed(int i) {
// TODO: some code goes here
int iTh = i / 8;
int bitTh = i % 8;
int onBit = (header[iTh] >> bitTh) & 1;
return onBit == 1;
}
/**
* Abstraction to fill or clear a slot on this page.
*/
private void markSlotUsed(int i, boolean value) {
// TODO: some code goes here
// not necessary for lab1
}
/**
* @return an iterator over all tuples on this page (calling remove on this iterator throws an UnsupportedOperationException)
* (note that this iterator shouldn't return tuples in empty slots!)
*/
public Iterator<Tuple> iterator() {
// TODO: some code goes here
List<Tuple> tupleList = new ArrayList<>();
for (int i = 0; i < numSlots; ++i) {
if (isSlotUsed(i)) {
tupleList.add(tuples[i]);
}
}
return tupleList.iterator();
}
}
isSlotUsed判断第i个槽位是否有效。
RecordId.java代码:
package simpledb.storage;
import java.io.Serializable;
/**
* A RecordId is a reference to a specific tuple on a specific page of a
* specific table.
*/
public class RecordId implements Serializable {
private static final long serialVersionUID = 1L;
private PageId pid; // page id
private int tupleNo; // tuple number within the page.
/**
* Creates a new RecordId referring to the specified PageId and tuple
* number.
*
* @param pid the pageid of the page on which the tuple resides
* @param tupleno the tuple number within the page.
*/
public RecordId(PageId pid, int tupleno) {
// TODO: some code goes here
this.pid = pid;
this.tupleNo = tupleno;
}
/**
* @return the tuple number this RecordId references.
*/
public int getTupleNumber() {
// TODO: some code goes here
return tupleNo;
}
/**
* @return the page id this RecordId references.
*/
public PageId getPageId() {
// TODO: some code goes here
return pid;
}
/**
* Two RecordId objects are considered equal if they represent the same
* tuple.
*
* @return True if this and o represent the same tuple
*/
@Override
public boolean equals(Object o) {
// TODO: some code goes here
if (!(o instanceof RecordId)) {
return false;
}
RecordId other = (RecordId) o;
return this.pid.equals(other.getPageId()) && this.tupleNo == other.getTupleNumber();
}
/**
* You should implement the hashCode() so that two equal RecordId instances
* (with respect to equals()) have the same hashCode().
*
* @return An int that is the same for equal RecordId objects.
*/
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + pid.getTableId();
result = prime * result + pid.getPageNumber();
result = prime * result + tupleNo;
return result;
}
}
对Exercise 4执行单元测试:
ant runtest -Dtest=HeapPageIdTest
ant runtest -Dtest=RecordIDTest
ant runtest -Dtest=HeapPageReadTest
这里的单元测试应该是successful的。
Exercise 5
需要完成的是:
- src/java/simpledb/storage/HeapFile.java
HeapFile就是DbFile用堆结构的具体实现了,一个HeapFile属于一张表,包含多个页,会在使用时加载到buffer pool中。
HeapFile.java代码:
package simpledb.storage;
import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Debug;
import simpledb.common.Permissions;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;
import java.io.*;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;
/**
* HeapFile is an implementation of a DbFile that stores a collection of tuples
* in no particular order. Tuples are stored on pages, each of which is a fixed
* size, and the file is simply a collection of those pages. HeapFile works
* closely with HeapPage. The format of HeapPages is described in the HeapPage
* constructor.
*
* @author Sam Madden
* @see HeapPage#HeapPage
*/
public class HeapFile implements DbFile {
private final File file;
private final TupleDesc tupleDesc;
private static final class HeapFileIterator implements DbFileIterator {
private final TransactionId tid;
private final HeapFile heapFile;
private Iterator<Tuple> tupleIterator;
private int currentPageNo;
public HeapFileIterator(TransactionId tid, HeapFile heapFile) {
this.tid = tid;
this.heapFile = heapFile;
}
@Override
public void open() throws DbException, TransactionAbortedException {
currentPageNo = 0;
tupleIterator = getTupleIterator(currentPageNo);
}
private Iterator<Tuple> getTupleIterator(int pageNo) throws TransactionAbortedException, DbException {
if (pageNo >= 0 && pageNo < heapFile.numPages()) {
HeapPageId pid = new HeapPageId(heapFile.getId(), pageNo);
HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, pid, Permissions.READ_ONLY);
return page.iterator();
} else {
throw new DbException(String.format("heapFile %d doesn't exist in page[%d]", pageNo, heapFile.getId()));
}
}
@Override
public boolean hasNext() throws DbException, TransactionAbortedException {
if (tupleIterator == null) {
return false;
}
if (tupleIterator.hasNext()) {
return true;
}
if (currentPageNo < heapFile.numPages() - 1) {
currentPageNo++;
tupleIterator = getTupleIterator(currentPageNo);
return tupleIterator.hasNext();
}
return false;
}
@Override
public Tuple next() throws DbException, TransactionAbortedException {
if (tupleIterator == null || !tupleIterator.hasNext()) {
throw new NoSuchElementException();
}
return tupleIterator.next();
}
@Override
public void rewind() throws DbException, TransactionAbortedException {
close();
open();
}
@Override
public void close() {
tupleIterator = null;
currentPageNo = 0;
}
}
/**
* Constructs a heap file backed by the specified file.
*
* @param f the file that stores the on-disk backing store for this heap
* file.
*/
public HeapFile(File f, TupleDesc td) {
// TODO: some code goes here
this.file = f;
this.tupleDesc = td;
}
/**
* Returns the File backing this HeapFile on disk.
*
* @return the File backing this HeapFile on disk.
*/
public File getFile() {
// TODO: some code goes here
return file;
}
/**
* Returns an ID uniquely identifying this HeapFile. Implementation note:
* you will need to generate this tableid somewhere to ensure that each
* HeapFile has a "unique id," and that you always return the same value for
* a particular HeapFile. We suggest hashing the absolute file name of the
* file underlying the heapfile, i.e. f.getAbsoluteFile().hashCode().
*
* @return an ID uniquely identifying this HeapFile.
*/
public int getId() {
// TODO: some code goes here
return file.getAbsoluteFile().hashCode();
}
/**
* Returns the TupleDesc of the table stored in this DbFile.
*
* @return TupleDesc of this DbFile.
*/
public TupleDesc getTupleDesc() {
// TODO: some code goes here
return this.tupleDesc;
}
// see DbFile.java for javadocs
public Page readPage(PageId pid) {
// TODO: some code goes here
int tableId = pid.getTableId();
int pageNo = pid.getPageNumber();
int offset = pageNo * BufferPool.getPageSize();
RandomAccessFile randomAccessFile = null;
try {
randomAccessFile = new RandomAccessFile(file, "r");
if ((long) (pageNo + 1) * BufferPool.getPageSize() > randomAccessFile.length()) {
randomAccessFile.close();
throw new IllegalArgumentException("pageNo is out of file length");
}
byte[] data = new byte[BufferPool.getPageSize()];
randomAccessFile.seek(offset);
int read = randomAccessFile.read(data, 0, BufferPool.getPageSize());
if (read != BufferPool.getPageSize()) {
throw new IllegalArgumentException("read page failed");
}
HeapPageId id = new HeapPageId(pid.getTableId(), pid.getPageNumber());
return new HeapPage(id, data);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (randomAccessFile != null) {
randomAccessFile.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
throw new IllegalArgumentException("read page failed");
}
// see DbFile.java for javadocs
public void writePage(Page page) throws IOException {
// TODO: some code goes here
// not necessary for lab1
}
/**
* Returns the number of pages in this HeapFile.
*/
public int numPages() {
// TODO: some code goes here
return (int) Math.floor(getFile().length() * 1.0 / BufferPool.getPageSize());
}
// see DbFile.java for javadocs
public List<Page> insertTuple(TransactionId tid, Tuple t)
throws DbException, IOException, TransactionAbortedException {
// TODO: some code goes here
return null;
// not necessary for lab1
}
// see DbFile.java for javadocs
public List<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException,
TransactionAbortedException {
// TODO: some code goes here
return null;
// not necessary for lab1
}
// see DbFile.java for javadocs
public DbFileIterator iterator(TransactionId tid) {
// TODO: some code goes here
return new HeapFileIterator(tid, this);
}
}
HeapFileIterator类包含了事务ID、所要遍历的HeapFile、当前页面的元组迭代器和当前页面的编号。这里HeapFileIterator实现为HeapFile的内部类,有以下几个优点:
- 封装性:内部类可以访问外部类的所有成员(包括私有成员)。这样,HeapFileIterator可以直接访问HeapFile的成员变量和方法,而无需通过getter或setter。这使得代码更简洁,更易于维护。
- 增强代码的可读性和组织性:HeapFileIterator是专门为HeapFile服务的,将其定义为内部类可以更清楚地表明这种关系,并将相关的代码放在一起,提高代码的可读性和组织性。
- 增加代码的灵活性:内部类可以独立于外部类而变化,这使得编写可维护的代码更为灵活。例如,如果只有HeapFileIterator需要改变,那么可以只修改这个内部类,而不会影响到HeapFile。
readPage使用RandomAccessFile进行随机读写。与标准的输入/输出流不同,RandomAccessFile允许跳转到文件的任何位置来读写数据,它的主要方法包括:
- seek(long pos):将文件指针设置到文件的指定位置。
- read(byte[] b):从当前文件指针位置开始读取数据到字节数组。
- write(byte[] b):将字节数组的数据写入到当前文件指针位置。
- length():返回文件的长度。
- close():关闭这个随机访问文件流并释放任何系统资源与之关联。
对Exercise 5执行单元测试:
ant runtest -Dtest=HeapFileReadTest
这里的单元测试应该是successful的。
Exercise 6
需要完成的是:
- src/java/simpledb/execution/SeqScan.java
SeqScan类是一个实现了OpIterator接口的类,用于对数据库表顺序扫描。
SeqScan.java代码:
package simpledb.execution;
import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Type;
import simpledb.storage.DbFileIterator;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;
import java.util.NoSuchElementException;
/**
* SeqScan is an implementation of a sequential scan access method that reads
* each tuple of a table in no particular order (e.g., as they are laid out on
* disk).
*/
public class SeqScan implements OpIterator {
private static final long serialVersionUID = 1L;
private final TransactionId tid;
private int tableId;
private String tableAlias;
private DbFileIterator iterator;
/**
* Creates a sequential scan over the specified table as a part of the
* specified transaction.
*
* @param tid The transaction this scan is running as a part of.
* @param tableid the table to scan.
* @param tableAlias the alias of this table (needed by the parser); the returned
* tupleDesc should have fields with name tableAlias.fieldName
* (note: this class is not responsible for handling a case where
* tableAlias or fieldName are null. It shouldn't crash if they
* are, but the resulting name can be null.fieldName,
* tableAlias.null, or null.null).
*/
public SeqScan(TransactionId tid, int tableid, String tableAlias) {
// TODO: some code goes here
this.tid = tid;
this.tableId = tableid;
this.tableAlias = tableAlias;
}
/**
* @return return the table name of the table the operator scans. This should
* be the actual name of the table in the catalog of the database
*/
public String getTableName() {
return Database.getCatalog().getTableName(tableId);
}
/**
* @return Return the alias of the table this operator scans.
*/
public String getAlias() {
// TODO: some code goes here
return this.tableAlias;
}
/**
* Reset the tableid, and tableAlias of this operator.
*
* @param tableid the table to scan.
* @param tableAlias the alias of this table (needed by the parser); the returned
* tupleDesc should have fields with name tableAlias.fieldName
* (note: this class is not responsible for handling a case where
* tableAlias or fieldName are null. It shouldn't crash if they
* are, but the resulting name can be null.fieldName,
* tableAlias.null, or null.null).
*/
public void reset(int tableid, String tableAlias) {
// TODO: some code goes here
this.tableId = tableid;
this.tableAlias = tableAlias;
}
public SeqScan(TransactionId tid, int tableId) {
this(tid, tableId, Database.getCatalog().getTableName(tableId));
}
public void open() throws DbException, TransactionAbortedException {
// TODO: some code goes here
iterator = Database.getCatalog().getDatabaseFile(tableId).iterator(tid);
iterator.open();
}
/**
* Returns the TupleDesc with field names from the underlying HeapFile,
* prefixed with the tableAlias string from the constructor. This prefix
* becomes useful when joining tables containing a field(s) with the same
* name. The alias and name should be separated with a "." character
* (e.g., "alias.fieldName").
*
* @return the TupleDesc with field names from the underlying HeapFile,
* prefixed with the tableAlias string from the constructor.
*/
public TupleDesc getTupleDesc() {
// TODO: some code goes here
TupleDesc oldDesc = Database.getCatalog().getTupleDesc(tableId);
String[] names = new String[oldDesc.numFields()];
Type[] types = new Type[oldDesc.numFields()];
for (int i = 0; i < oldDesc.numFields(); ++i) {
names[i] = getAlias() + "." + oldDesc.getFieldName(i);
types[i] = oldDesc.getFieldType(i);
}
return new TupleDesc(types, names);
}
public boolean hasNext() throws TransactionAbortedException, DbException {
// TODO: some code goes here
if (iterator != null) {
return iterator.hasNext();
}
return false;
}
public Tuple next() throws NoSuchElementException,
TransactionAbortedException, DbException {
// TODO: some code goes here
if (iterator == null) {
throw new NoSuchElementException();
}
Tuple t = iterator.next();
if (t == null) {
throw new NoSuchElementException();
}
return t;
}
public void close() {
// TODO: some code goes here
iterator = null;
}
public void rewind() throws DbException, NoSuchElementException,
TransactionAbortedException {
// TODO: some code goes here
iterator.rewind();
}
}
对Exercise 6执行单元测试,注意是系统测试:
ant runsystest -Dtest=ScanTest
这里的单元测试应该是successful的。
A simple test
在src/java/simpledb/
目录下新建test.java
文件,功能相当于SELECT * FROM some_data_file
,内容是实验手册给出的,需要添加一些import:
package simpledb;
import java.io.*;
import simpledb.common.Database;
import simpledb.common.Type;
import simpledb.execution.SeqScan;
import simpledb.storage.HeapFile;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;
import simpledb.transaction.TransactionId;
public class test {
public static void main(String[] argv) {
// construct a 3-column table schema
Type types[] = new Type[]{ Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE };
String names[] = new String[]{ "field0", "field1", "field2" };
TupleDesc descriptor = new TupleDesc(types, names);
// create the table, associate it with some_data_file.dat
// and tell the catalog about the schema of this table.
HeapFile table1 = new HeapFile(new File("some_data_file.dat"), descriptor);
Database.getCatalog().addTable(table1, "test");
// construct the query: we use a simple SeqScan, which spoonfeeds
// tuples via its iterator.
TransactionId tid = new TransactionId();
SeqScan f = new SeqScan(tid, table1.getId());
try {
// and run it
f.open();
while (f.hasNext()) {
Tuple tup = f.next();
System.out.println(tup);
}
f.close();
Database.getBufferPool().transactionComplete(tid);
} catch (Exception e) {
System.out.println ("Exception : " + e);
}
}
}
可以在项目顶层目录下准备一个数据文件some_data_file.txt
,其中包含以下内容:
1,1,1
5,2,2
8,4,4
5,6,7
先使用ant打包程序:
ant
如果没有报错,就将测试数据文件转换为数据库查询的二进制文件:
java -jar dist/simpledb.jar convert some_data_file.txt 3
末尾的参数3
是因为测试文件的数据有3列,此时会生成some_data_file.dat
,然后调用test验证:
java -classpath dist/simpledb.jar simpledb.test
也可以在测试程序中打印tup的地方调用getField
方法来打印查询到的具体数据。